[
https://issues.apache.org/jira/browse/METAMODEL-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680225#comment-14680225
]
ASF GitHub Bot commented on METAMODEL-163:
------------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/metamodel/pull/37
> Composite/directory Resource for local files and HDFS files
> -----------------------------------------------------------
>
> Key: METAMODEL-163
> URL: https://issues.apache.org/jira/browse/METAMODEL-163
> Project: Apache MetaModel
> Issue Type: Improvement
> Reporter: Kasper Sørensen
>
> A more and more common pattern in representing data is to have a directory
> with files of the same format which can be appended together to form a
> complete dataset. I see this especially in Hadoop scenarios where reducers as
> well as spark usually will create such "part" files in a directory and treat
> that directory almost as a logical file.
> I don't know if we can generalize this or if we need two separate
> implementations. But at least I would love to have a Resource implementation
> like this: Given a (local or HDFS) path that points to a directory, or maybe
> also to a wildcard-enabled expression, I would want to have a single Resource
> object that represents all the corresponding files in that directory/pattern.
> This would not only provide us with better interoperability with Hadoop
> result data, but it will also actually solve a long-standing request (in our
> company at least) to support multiple CSV files in one logical CsvDataContext.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)