[ 
https://issues.apache.org/jira/browse/ARROW-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469382#comment-17469382
 ] 

Weston Pace commented on ARROW-15254:
-------------------------------------

Yes, that would be a difficult situation to handle, especially since we have no 
control over the size of the last block.

One thing we might be able to do is check the file size up front so we always 
know how many bytes are remaining.  Then we could change our chunking logic so 
that instead of a small trailing final block we have an overly large final 
block which is always {{>= block_size && < 2*block_size}}.  Then we could 
simply throw an error if we encounter a file where the footer is larger than 
the block size.  There is no way to check this at reader creation time since 
footer size is "# of lines" and block size is "# of bytes" but I imagine the 
situation would be quite rare.

It would add some complexity but it shouldn't have much impact on performance.  
Although it would add a touch of latency because we'd need to query for the 
file size.  CSV blocks are typically small enough that having a slightly too 
large footer block shouldn't be a problem.

> [C++] Ability to skip CSV footer when reading in dataset
> --------------------------------------------------------
>
>                 Key: ARROW-15254
>                 URL: https://issues.apache.org/jira/browse/ARROW-15254
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Nicola Crane
>            Priority: Major
>
> In ARROW-15252 a user reports wanting to be able to skip the final row of a 
> CSV (the footer) when reading in a dataset of CSVs - is this possible to 
> implement?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to