[ https://issues.apache.org/jira/browse/ARROW-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998473#comment-16998473 ]
Ben Kietzman commented on ARROW-7415: ------------------------------------- I think before cracking open CSV or writing of any format we should tackle scanning IPC files. Ideally nothing breaks and adding IpcFormat takes a week and brings us "Mixing different file formats" as a bonus. Otherwise we find pain points in the scanner API *before* we're also using it to test the (probably much more complex) CsvFormat. The same reasoning applies to writing datasets; most of our tests will probably just assume working scanners and do round trips. Finally, although we started by scanning ParquetFormat I think IpcFormat is the ideal format to start working on writing since there is so little impedance mismatch between it and in-memory structures. > [C++][Dataset] Implement IpcFormat for sources composed of ipc files > -------------------------------------------------------------------- > > Key: ARROW-7415 > URL: https://issues.apache.org/jira/browse/ARROW-7415 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Dataset > Affects Versions: 0.15.1 > Reporter: Ben Kietzman > Assignee: Ben Kietzman > Priority: Major > Fix For: 1.0.0 > > > Currently only parquet is supported. IPC files make a nice test case for > multiple file formats since they also have a completely unambiguous physical > schema (unlike CSV) and support for reading/writing is already present. -- This message was sent by Atlassian Jira (v8.3.4#803005)