[
https://issues.apache.org/jira/browse/MAPREDUCE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Kimball updated MAPREDUCE-1446:
-------------------------------------
Attachment: MAPREDUCE-1446.patch
Attaching a patch which provides this functionality. The main challenge of BLOB
and CLOB data is that it can result in very large records -- larger than will
fit in memory all at once. The current patch proposes a mechanism to have two
serializations for CLOB/BLOB data:
* Data less than 16MB will be stored inline in the record bodies
* Data greater than 16MB will be stored in separate files in HDFS; the records
will contain only a pointer to the file. This will then be accessed through an
InputStream interface so that users can buffer in as much data as is
appropriate.
The latter of these two mechanisms is unimplemented, but placeholders have been
left in the code where necessary. The boundary size (16MB) is also a load-time
parameter. It is currently hardcoded, but it would be trivial to allow users to
configure this to their own liking based on their datasets, hardware, etc.
> Sqoop should support CLOB and BLOB datatypes
> --------------------------------------------
>
> Key: MAPREDUCE-1446
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1446
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: contrib/sqoop
> Reporter: Aaron Kimball
> Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1446.patch
>
>
> Sqoop should allow import of CLOB and BLOB based data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.