[ 
https://issues.apache.org/jira/browse/GRIFFIN-297?focusedWorklogId=342633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342633
 ]

ASF GitHub Bot logged work on GRIFFIN-297:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Nov/19 14:35
            Start Date: 13/Nov/19 14:35
    Worklog Time Spent: 10m 
      Work Description: chitralverma commented on pull request #555: [WIP] 
[GRIFFIN-297] Allow support for additional file based data sources
URL: https://github.com/apache/griffin/pull/555
 
 
   **What changes were proposed in this pull request?**
   
   The PR extends the current support beyond just Avro and Text for various 
file based data sources (Parquet, ORC, etc). 
   
    - Allows users to specify additional file based data sources like Parquet, 
CSV, TSV, ORC etc.
    - Above formats are supported in both batch and streaming mode. 
    - Allows data to be read directly from stand-alone files as well as 
directories present in both local/ distributed file systems.
    - Allows users to specify schema directly through options (useful for CSV/ 
TSV types).
   
   A sample config looks like,
   
   ```
   {
     "name": "source",
     "baseline": true,
     "connectors": [
       {
         "type": "file",
         "version": "1.7",
         "config": {
           "format": "parquet",
           "options": { 
             "k1": "v1",
             "k2": "v2"
           },
           "paths": [
             "/home/chitral/path/to/source/",
             "/home/chitral/path/to/test.parquet"
           ]
         }
       }
     ]
   }
   
   ```
   **Does this PR introduce any user-facing change?**
   No
   
   **How was this patch tested?**
   Griffin test suite. Additional unit test have also been added.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 342633)
    Remaining Estimate: 0h
            Time Spent: 10m

> Allow support for additional file based data sources
> ----------------------------------------------------
>
>                 Key: GRIFFIN-297
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-297
>             Project: Griffin
>          Issue Type: Improvement
>            Reporter: Chitral Verma
>            Priority: Major
>              Labels: features
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the current version of Apache griffin (0.5.0), there is very limited 
> support for file based data sources as only Avro and Text files are 
> supported. 
> I propose the feature to allow support for additional file based data sources 
> like Parquet, CSV, TSV, ORC etc in both batch and streaming mode. Since most 
> of the above sources already have first class support provided by spark, the 
> implementation is straight forward.
> Also, this feature will allow data to be read directly from stand alone files 
> as well as directories present in both local and distributed filesystems.
> A sample config would look like,
> {noformat}
> {
>   "name": "source",
>   "baseline": true,
>   "connectors": [
>     {
>       "type": "file",
>       "version": "1.7",
>       "config": {
>         "format": "parquet",
>         "options": { 
>           "k1": "v1",
>           "k2": "v2"
>         },
>         "paths": [
>           "/home/chitral/path/to/source/",
>           "/home/chitral/path/to/test.parquet"
>         ]
>       }
>     }
>   ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to