[
https://issues.apache.org/jira/browse/HBASE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792455#comment-13792455
]
Istvan Vajnorak commented on HBASE-8593:
----------------------------------------
Dear Nick & Rajesh,
Thanks for your quick reply. I did not assume fix width but rather some sort of
separator. Given that we had to intrepret the elements in a type safe manner, i
have decided to leave the low level byte[] parsing (which i liked given that it
is more performant than working with string), and provided the input as a Text
to my "parser" with an abstraction of:
public interface InputParser extends Configurable {
/**
* @param key
* @param value
* @return
*/
String[] parse(Text value);
}
After this, the mapper just ensured that the right element from the array is
matched with the right type system, and invoked the relevant Bytes.toBytes()
implementation.
I have to admit that it narrowed down things, as i haven't tried it with other
kind of imputs but text.
(No Sequence file or direct binary, which i think is supported by ImportTsv).
I believe the Raw mappings would be a way to go as it would probably reduce the
duplication of an additional enum to maintain.
Looking at the solution we already have here, it is already supporting most of
the cases, the only thing that the pattern i mentioned above could bring in is
proximity, where the user wouldn't have to separate the types and the column
names, but for this regular expression matches would need to be added, which
might complicate the picture a bit.
I have uploaded the mapper i have come up with for the situation we had it
might be useful should the dialect of the column and type specification we used
proven useful on the long run.
Best regards,
Istvan
> Type support in ImportTSV tool
> ------------------------------
>
> Key: HBASE-8593
> URL: https://issues.apache.org/jira/browse/HBASE-8593
> Project: HBase
> Issue Type: Sub-task
> Components: mapreduce
> Reporter: Anoop Sam John
> Assignee: rajeshbabu
> Fix For: 0.96.0
>
> Attachments: HBASE-8593.patch, HBASE-8593_v2.patch,
> HBASE-8593_v4.patch, ReportMapper.java
>
>
> Now the ImportTSV tool treats all the table column to be of type String. It
> converts the input data into bytes considering its type to be String. Some
> times user will need a type of say int/float to get added to table by using
> this tool.
--
This message was sent by Atlassian JIRA
(v6.1#6144)