[ 
https://issues.apache.org/jira/browse/HBASE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792455#comment-13792455
 ] 

Istvan Vajnorak commented on HBASE-8593:
----------------------------------------

Dear Nick & Rajesh,

Thanks for your quick reply. I did not assume fix width but rather some sort of 
separator. Given that we had to intrepret the elements in a type safe manner, i 
have decided to leave the low level byte[] parsing (which i liked given that it 
is more performant than working with string), and provided the input as a Text 
to my "parser" with an abstraction of:

public interface InputParser extends Configurable {

        /**
         * @param key
         * @param value
         * @return
         */
        String[] parse(Text value);
}

After this, the mapper just ensured that the right element from the array is 
matched with the right type system, and invoked the relevant Bytes.toBytes() 
implementation.

I have to admit that it narrowed down things, as i haven't tried it with other 
kind of imputs but text. 
(No Sequence file or direct binary, which i think is supported by ImportTsv).

I believe the Raw mappings would be a way to go as it would probably reduce the 
duplication of an additional enum to maintain.

Looking at the solution we already have here, it is already supporting most of 
the cases, the only thing that the pattern i mentioned above could bring in is 
proximity, where the user wouldn't have to separate the types and the column 
names, but for this regular expression matches would need to be added, which 
might complicate the picture a bit.

I have uploaded the mapper i have come up with for the situation we had it 
might be useful should the dialect of the column and type specification we used 
proven useful on the long run.

Best regards,
 Istvan


> Type support in ImportTSV tool
> ------------------------------
>
>                 Key: HBASE-8593
>                 URL: https://issues.apache.org/jira/browse/HBASE-8593
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce
>            Reporter: Anoop Sam John
>            Assignee: rajeshbabu
>             Fix For: 0.96.0
>
>         Attachments: HBASE-8593.patch, HBASE-8593_v2.patch, 
> HBASE-8593_v4.patch, ReportMapper.java
>
>
> Now the ImportTSV tool treats all the table column to be of type String. It 
> converts the input data into bytes considering its type to be String. Some 
> times user will need a type of say int/float to get added to table by using 
> this tool.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to