[jira] [Commented] (ARROW-7415) [C++][Dataset] Implement IpcFormat for sources composed of ipc files

Ben Kietzman (Jira) Tue, 17 Dec 2019 10:34:16 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998473#comment-16998473
 ]


Ben Kietzman commented on ARROW-7415:
-------------------------------------

I think before cracking open CSV or writing of any format we should tackle 
scanning IPC files. Ideally nothing breaks and adding IpcFormat takes a week 
and brings us "Mixing different file formats" as a bonus. Otherwise we find 
pain points in the scanner API *before* we're also using it to test the 
(probably much more complex) CsvFormat. The same reasoning applies to writing 
datasets; most of our tests will probably just assume working scanners and do 
round trips. Finally, although we started by scanning ParquetFormat I think 
IpcFormat is the ideal format to start working on writing since there is so 
little impedance mismatch between it and in-memory structures.

> [C++][Dataset] Implement IpcFormat for sources composed of ipc files
> --------------------------------------------------------------------
>
>                 Key: ARROW-7415
>                 URL: https://issues.apache.org/jira/browse/ARROW-7415
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++ - Dataset
>    Affects Versions: 0.15.1
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 1.0.0
>
>
> Currently only parquet is supported. IPC files make a nice test case for 
> multiple file formats since they also have a completely unambiguous physical 
> schema (unlike CSV) and support for reading/writing is already present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7415) [C++][Dataset] Implement IpcFormat for sources composed of ipc files

Reply via email to