[jira] Updated: (MAPREDUCE-1446) Sqoop should support CLOB and BLOB datatypes

Aaron Kimball (JIRA) Tue, 02 Feb 2010 13:18:53 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aaron Kimball updated MAPREDUCE-1446:
-------------------------------------

    Attachment: MAPREDUCE-1446.patch

Attaching a patch which provides this functionality. The main challenge of BLOB 
and CLOB data is that it can result in very large records -- larger than will 
fit in memory all at once. The current patch proposes a mechanism to have two 
serializations for CLOB/BLOB data:

* Data less than 16MB will be stored inline in the record bodies
* Data greater than 16MB will be stored in separate files in HDFS; the records 
will contain only a pointer to the file. This will then be accessed through an 
InputStream interface so that users can buffer in as much data as is 
appropriate.

The latter of these two mechanisms is unimplemented, but placeholders have been 
left in the code where necessary. The boundary size (16MB) is also a load-time 
parameter. It is currently hardcoded, but it would be trivial to allow users to 
configure this to their own liking based on their datasets, hardware, etc.

> Sqoop should support CLOB and BLOB datatypes
> --------------------------------------------
>
>                 Key: MAPREDUCE-1446
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1446
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1446.patch
>
>
> Sqoop should allow import of CLOB and BLOB based data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1446) Sqoop should support CLOB and BLOB datatypes

Reply via email to