[ 
https://issues.apache.org/jira/browse/SQOOP-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Xu reassigned SQOOP-1390:
------------------------------

    Assignee: Qian Xu

> Convert Sqoop format to Parquet format via MapReduce
> ----------------------------------------------------
>
>                 Key: SQOOP-1390
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1390
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: tools
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>
> Parquet files keep data in contiguous chunks by column, appending new records 
> to a dataset requires rewriting substantial portions of existing a file or 
> buffering records to create a new file. So while Parquet may have storage and 
> query benefits, it doesn't make sense to write to it directly from 
> record-based tools. We'd consider to use Kite SDK to simplify the handling of 
> Parquet specific things.
> The following listed the major areas for this:
> * Implement ParquetImportMapper
> * Hook up the ParquetOutputFormat and ParquetImportMapper in the import job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to