[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1524:
-------------------------------------

    Attachment: MAPREDUCE-1524.patch

Attaching a patch which provides this functionality. This completes the 
ClobRef/BlobRef interface added in MAPREDUCE-1446. These objects can reference 
inline data; if data is too large, however the actual values will be imported 
to separate files in HDFS. They are placed in a {{_lobs}} directory underneath 
the import path. The relative filename is then preserved in the inline import 
data, for later reconstruction of the objects in user-side code.

This import process makes use of the FileOutputCommitter's ability to promote 
side-channel work files from only succeeded tasks. LOB files are named with the 
attempt id embedded to prevent collisions.

The import process itself now includes a delayed component. Using 
FileOutputCommitter, the FileSystem for the current Context/Configuration, etc, 
requires access to the MapContext, which is unavailable inside DBWritable's 
{{readFields(ResultSet)}} method. This approach now adds another method to 
{{SqoopRecord}}, {{loadLargeObjects()}}. This method is called in the import 
map() method to propagate the current Context into a {{LargeObjectLoader}}.

This addition to the {{SqoopRecord}} interface makes this an incompatible 
change, because previously generated SqoopRecord objects cannot now be used 
with this version of Sqoop; users will need to regenerate their classes before 
reusing data stored in SqoopRecord instances (e.g., with {{--generate-only}}. 
The on-disk layout of existing SqoopRecord instances is unaffected.

Unit tests are added for LargeObjectLoader, ClobRef, and BlobRef to verify all 
of the above functionality.

Users can control the threshold at which CLOB/BLOB fields are no longer 
directly materialized with the {{\-\-inline-lob-limit}} argument. The default 
value for this is 16 MB.

> Support for CLOB and BLOB values larger than can fit in memory
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-1524
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1524
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1524.patch
>
>
> The patch in MAPREDUCE-1446 provides support for "inline" CLOB and BLOB 
> values which can be fully materialized. Values which are too big for RAM 
> should be written to separate files in HDFS and referenced in an indirect 
> fashion; access should be provided through a stream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to