[ 
https://issues.apache.org/jira/browse/GRIFFIN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chitral Verma updated GRIFFIN-297:
----------------------------------
    Description: 
In the current version of Apache griffin (0.5.0), there is very limited support 
for file based data sources as only Avro and Text files are supported. 

I propose the feature to allow support for additional file based data sources 
like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most of 
the above sources already have first class support provided by task, the 
implementation is straight forward.

Also, this feature will allow data to be read directly from stand alone files 
as well as directories.

A sample config would look like,
{noformat}
{
  "name": "source",
  "baseline": true,
  "connectors": [
    {
      "type": "file",
      "version": "1.7",
      "config": {
        "format": "parquet",
        "paths": [
          "/home/chitral/path/to/source/",
          "/home/chitral/path/to/test.parquet",
        ]
      }
    }
  ]
}{noformat}

  was:
In the current version of Apache griffin (0.5.0), there is very limited support 
for file based data sources as only Avro and Text files are supported. 

I propose the feature to allow support for additional file based data sources 
like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most of 
the above sources already have first class support provided by task, the 
implementation is straight forward.

Also, this feature will allow data to be read directly from stand alone files 
as well as directories. 


> Allow support for File based data sources
> -----------------------------------------
>
>                 Key: GRIFFIN-297
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-297
>             Project: Griffin
>          Issue Type: New Feature
>            Reporter: Chitral Verma
>            Priority: Major
>              Labels: features
>
> In the current version of Apache griffin (0.5.0), there is very limited 
> support for file based data sources as only Avro and Text files are 
> supported. 
> I propose the feature to allow support for additional file based data sources 
> like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most 
> of the above sources already have first class support provided by task, the 
> implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files 
> as well as directories.
> A sample config would look like,
> {noformat}
> {
>   "name": "source",
>   "baseline": true,
>   "connectors": [
>     {
>       "type": "file",
>       "version": "1.7",
>       "config": {
>         "format": "parquet",
>         "paths": [
>           "/home/chitral/path/to/source/",
>           "/home/chitral/path/to/test.parquet",
>         ]
>       }
>     }
>   ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to