[
https://issues.apache.org/jira/browse/ARROW-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998473#comment-16998473
]
Ben Kietzman commented on ARROW-7415:
-------------------------------------
I think before cracking open CSV or writing of any format we should tackle
scanning IPC files. Ideally nothing breaks and adding IpcFormat takes a week
and brings us "Mixing different file formats" as a bonus. Otherwise we find
pain points in the scanner API *before* we're also using it to test the
(probably much more complex) CsvFormat. The same reasoning applies to writing
datasets; most of our tests will probably just assume working scanners and do
round trips. Finally, although we started by scanning ParquetFormat I think
IpcFormat is the ideal format to start working on writing since there is so
little impedance mismatch between it and in-memory structures.
> [C++][Dataset] Implement IpcFormat for sources composed of ipc files
> --------------------------------------------------------------------
>
> Key: ARROW-7415
> URL: https://issues.apache.org/jira/browse/ARROW-7415
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++ - Dataset
> Affects Versions: 0.15.1
> Reporter: Ben Kietzman
> Assignee: Ben Kietzman
> Priority: Major
> Fix For: 1.0.0
>
>
> Currently only parquet is supported. IPC files make a nice test case for
> multiple file formats since they also have a completely unambiguous physical
> schema (unlike CSV) and support for reading/writing is already present.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)