[
https://issues.apache.org/jira/browse/GRIFFIN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chitral Verma updated GRIFFIN-297:
----------------------------------
Description:
In the current version of Apache griffin (0.5.0), there is very limited support
for file based data sources as only Avro and Text files are supported.
I propose the feature to allow support for additional file based data sources
like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most of
the above sources already have first class support provided by task, the
implementation is straight forward.
Also, this feature will allow data to be read directly from stand alone files
as well as directories.
A sample config would look like,
{noformat}
{
"name": "source",
"baseline": true,
"connectors": [
{
"type": "file",
"version": "1.7",
"config": {
"format": "parquet",
"paths": [
"/home/chitral/path/to/source/",
"/home/chitral/path/to/test.parquet",
]
}
}
]
}{noformat}
was:
In the current version of Apache griffin (0.5.0), there is very limited support
for file based data sources as only Avro and Text files are supported.
I propose the feature to allow support for additional file based data sources
like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most of
the above sources already have first class support provided by task, the
implementation is straight forward.
Also, this feature will allow data to be read directly from stand alone files
as well as directories.
> Allow support for File based data sources
> -----------------------------------------
>
> Key: GRIFFIN-297
> URL: https://issues.apache.org/jira/browse/GRIFFIN-297
> Project: Griffin
> Issue Type: New Feature
> Reporter: Chitral Verma
> Priority: Major
> Labels: features
>
> In the current version of Apache griffin (0.5.0), there is very limited
> support for file based data sources as only Avro and Text files are
> supported.
> I propose the feature to allow support for additional file based data sources
> like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most
> of the above sources already have first class support provided by task, the
> implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files
> as well as directories.
> A sample config would look like,
> {noformat}
> {
> "name": "source",
> "baseline": true,
> "connectors": [
> {
> "type": "file",
> "version": "1.7",
> "config": {
> "format": "parquet",
> "paths": [
> "/home/chitral/path/to/source/",
> "/home/chitral/path/to/test.parquet",
> ]
> }
> }
> ]
> }{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)