[ 
https://issues.apache.org/jira/browse/GRIFFIN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnnie updated GRIFFIN-278:
----------------------------
    Description: 
Griffin data connector designed to compare the dataset's accuracy between 
source and target.

However, in big data eco-system, most of the source is huge and will have 
hundreds of files in one folder. I think it would be great if griffin can 
handle the source by folder instead of a file by default.

 In addition, in spark normally it reads data from a folder. in this case we 
don't need to union all the files in one folder

  was:
Griffin data connector designed to compare the dataset's accuracy between 
source and target.

However, in big data eco-system, most of the source is huge and will have 
hundreds of files in one folder. I think it would be great if griffin can 
handle the source by folder instead of a file.

 


> AvroBatchDataConnector handle input is directory
> ------------------------------------------------
>
>                 Key: GRIFFIN-278
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-278
>             Project: Griffin
>          Issue Type: Improvement
>            Reporter: Johnnie
>            Priority: Major
>
> Griffin data connector designed to compare the dataset's accuracy between 
> source and target.
> However, in big data eco-system, most of the source is huge and will have 
> hundreds of files in one folder. I think it would be great if griffin can 
> handle the source by folder instead of a file by default.
>  In addition, in spark normally it reads data from a folder. in this case we 
> don't need to union all the files in one folder



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to