[
https://issues.apache.org/jira/browse/HBASE-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harsh J Chouraria updated HBASE-3623:
-------------------------------------
Attachment: hbase.importtsv.xml.friendly.r1.diff
I've attached a patch (against trunk/) that uses Base64 encoding to achieve
this.
Perhaps this can be back-ported too (vastly helps imports in some scenarios,
where one would otherwise translate (tr, etc.) the files before using this
tool).
The existing test-case for ImportTSV passes, and I have added a new one for
testing the importtsv's mapper (no test was present at all for this one).
> Allow non-XML representable separator characters in the ImportTSV tool
> ----------------------------------------------------------------------
>
> Key: HBASE-3623
> URL: https://issues.apache.org/jira/browse/HBASE-3623
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.1
> Environment: Cloudera Hadoop/HBase (3B4)
> Reporter: Harsh J Chouraria
> Labels: import
> Fix For: 0.92.0
>
> Attachments: hbase.importtsv.xml.friendly.r1.diff
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The current importtsv functionality will not work if one passes a non-XML
> representable character as the separator character (say, an escape character
> - \u001b, fairly common in use).
> {code}
> -Dimporttsv.separator=$'\x1b' # This param fails the submitter when
> serialized.
> {code}
> While this is a limitation with the Configuration class's being serialized as
> an XML, it can be circumvented by applying a suitable encoding that makes a
> string XML-compatible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira