[ https://issues.apache.org/jira/browse/GRIFFIN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johnnie updated GRIFFIN-278: ---------------------------- Description: Griffin data connector designed to compare the dataset's accuracy between source and target. However, in big data eco-system, most of the source is huge and will have hundreds of files in one folder. I think it would be great if griffin can handle the source by folder instead of a file by default. In addition, in spark normally it reads data from a folder. in this case we don't need to union all the files in one folder was: Griffin data connector designed to compare the dataset's accuracy between source and target. However, in big data eco-system, most of the source is huge and will have hundreds of files in one folder. I think it would be great if griffin can handle the source by folder instead of a file. > AvroBatchDataConnector handle input is directory > ------------------------------------------------ > > Key: GRIFFIN-278 > URL: https://issues.apache.org/jira/browse/GRIFFIN-278 > Project: Griffin > Issue Type: Improvement > Reporter: Johnnie > Priority: Major > > Griffin data connector designed to compare the dataset's accuracy between > source and target. > However, in big data eco-system, most of the source is huge and will have > hundreds of files in one folder. I think it would be great if griffin can > handle the source by folder instead of a file by default. > In addition, in spark normally it reads data from a folder. in this case we > don't need to union all the files in one folder -- This message was sent by Atlassian JIRA (v7.6.14#76016)