[
https://issues.apache.org/jira/browse/ARROW-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469382#comment-17469382
]
Weston Pace commented on ARROW-15254:
-------------------------------------
Yes, that would be a difficult situation to handle, especially since we have no
control over the size of the last block.
One thing we might be able to do is check the file size up front so we always
know how many bytes are remaining. Then we could change our chunking logic so
that instead of a small trailing final block we have an overly large final
block which is always {{>= block_size && < 2*block_size}}. Then we could
simply throw an error if we encounter a file where the footer is larger than
the block size. There is no way to check this at reader creation time since
footer size is "# of lines" and block size is "# of bytes" but I imagine the
situation would be quite rare.
It would add some complexity but it shouldn't have much impact on performance.
Although it would add a touch of latency because we'd need to query for the
file size. CSV blocks are typically small enough that having a slightly too
large footer block shouldn't be a problem.
> [C++] Ability to skip CSV footer when reading in dataset
> --------------------------------------------------------
>
> Key: ARROW-15254
> URL: https://issues.apache.org/jira/browse/ARROW-15254
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Nicola Crane
> Priority: Major
>
> In ARROW-15252 a user reports wanting to be able to skip the final row of a
> CSV (the footer) when reading in a dataset of CSVs - is this possible to
> implement?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)