[
https://issues.apache.org/jira/browse/ARROW-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774507#comment-16774507
]
Wes McKinney commented on ARROW-4648:
-------------------------------------
I think once we start the "C++ Dataset" project (design doc I am planning to
put up on mailing list soon) we should organize the "data source"
implementations as follows:
* arrow/dataset/csv
* arrow/dataset/json
* arrow/dataset/orc
* arrow/dataset/parquet
We must place these "file scanners" under a common framework or they will not
be usable in the context of a query engine
> [C++/Question] Naming/organizational inconsistencies in cpp codebase
> --------------------------------------------------------------------
>
> Key: ARROW-4648
> URL: https://issues.apache.org/jira/browse/ARROW-4648
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Krisztian Szucs
> Priority: Major
>
> Even after my eyes are used to the codebase, I still find the namings and/or
> code organization inconsistent.
> h2. File Formats
> So arrow already support a couple of file formats, namely parquet, feather,
> json, csv, orc, but their placement in the codebase is quiet odd:
> - parquet: src/parquet
> - feather: src/arrow/ipc/feather
> - orc: src/arrow/adapters/orc
> - csv: src/arrow/csv
> - json: src/arrow/json
> I might misunderstand the purpose of these sources, but I'd expect them to be
> organized under the same roof.
> h2. Inter-Process-Communication vs. Flight
> I'd expect flight's functionality from the ipc names.
> Flight's placement is a bit odd too, because it has its own codename, it
> should be placed under cpp/src - like parquet, plasma, or gandiva.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)