Kasper Sørensen created METAMODEL-163:
-----------------------------------------

             Summary: Composite/directory Resource for local files and HDFS 
files
                 Key: METAMODEL-163
                 URL: https://issues.apache.org/jira/browse/METAMODEL-163
             Project: Apache MetaModel
          Issue Type: Improvement
            Reporter: Kasper Sørensen


A more and more common pattern in representing data is to have a directory with 
files of the same format which can be appended together to form a complete 
dataset. I see this especially in Hadoop scenarios where reducers as well as 
spark usually will create such "part" files in a directory and treat that 
directory almost as a logical file.

I don't know if we can generalize this or if we need two separate 
implementations. But at least I would love to have a Resource implementation 
like this: Given a (local or HDFS) path that points to a directory, or maybe 
also to a wildcard-enabled expression, I would want to have a single Resource 
object that represents all the corresponding files in that directory/pattern.

This would not only provide us with better interoperability with Hadoop result 
data, but it will also actually solve a long-standing request (in our company 
at least) to support multiple CSV files in one logical CsvDataContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to