[ 
https://issues.apache.org/jira/browse/GRIFFIN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chitral Verma reassigned GRIFFIN-297:
-------------------------------------

    Assignee: Chitral Verma

> Allow support for additional file based data sources
> ----------------------------------------------------
>
>                 Key: GRIFFIN-297
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-297
>             Project: Griffin
>          Issue Type: Sub-task
>            Reporter: Chitral Verma
>            Assignee: Chitral Verma
>            Priority: Major
>              Labels: features
>             Fix For: 0.6.0
>
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In the current version of Apache griffin (0.5.0), there is very limited 
> support for file based data sources as only Avro and Text files are 
> supported. 
> I propose the feature to allow support for additional file based data sources 
> like Parquet, CSV, TSV, ORC etc in batch mode. Since most of the above 
> sources already have first class support provided by spark, the 
> implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files 
> as well as directories present in both local and distributed filesystems.
> A sample config would look like,
> {noformat}
> {
>   "name": "source",
>   "baseline": true,
>   "connectors": [
>     {
>       "type": "file",
>       "version": "1.7",
>       "config": {
>         "format": "parquet",
>         "options": { 
>           "k1": "v1",
>           "k2": "v2"
>         },
>         "paths": [
>           "/home/chitral/path/to/source/",
>           "/home/chitral/path/to/test.parquet"
>         ]
>       }
>     }
>   ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to