[
https://issues.apache.org/jira/browse/GRIFFIN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chitral Verma reassigned GRIFFIN-297:
-------------------------------------
Assignee: Chitral Verma
> Allow support for additional file based data sources
> ----------------------------------------------------
>
> Key: GRIFFIN-297
> URL: https://issues.apache.org/jira/browse/GRIFFIN-297
> Project: Griffin
> Issue Type: Sub-task
> Reporter: Chitral Verma
> Assignee: Chitral Verma
> Priority: Major
> Labels: features
> Fix For: 0.6.0
>
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> In the current version of Apache griffin (0.5.0), there is very limited
> support for file based data sources as only Avro and Text files are
> supported.
> I propose the feature to allow support for additional file based data sources
> like Parquet, CSV, TSV, ORC etc in batch mode. Since most of the above
> sources already have first class support provided by spark, the
> implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files
> as well as directories present in both local and distributed filesystems.
> A sample config would look like,
> {noformat}
> {
> "name": "source",
> "baseline": true,
> "connectors": [
> {
> "type": "file",
> "version": "1.7",
> "config": {
> "format": "parquet",
> "options": {
> "k1": "v1",
> "k2": "v2"
> },
> "paths": [
> "/home/chitral/path/to/source/",
> "/home/chitral/path/to/test.parquet"
> ]
> }
> }
> ]
> }{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)