[ 
https://issues.apache.org/jira/browse/HADOOP-17141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HADOOP-17141:
------------------------------------
    Description: 
The Hadoop {{Text}} class contains an array of byte which contain a UTF-8 
encoded string.  However, there is no way to quickly get the length of that 
string.  One can get the number of bytes in the byte array, but to figure out 
the length of the String, it needs to be decoded first.  In this simple 
example, sorting the {{Text}} objects by String length, the String needs to be 
decoded from the byte array repeatedly.  This was brought to my attention based 
on [HIVE-23870].

{code:java}
  public static void main(String[] args) {
    List<Text> list = Arrays.asList(new Text("1"), new Text("22"), new 
Text("333"));
    list.sort((Text t1, Text t2) -> t1.toString().length() - 
t2.toString().length());
  }
{code}

Also helpful if I want to check the last letter in the {{Text}} object 
repeatedly:

{code:java}
    Text t = new Text("4444");
    System.out.println(t.charAt(t.toString().length()));
{code}

  was:
The Hadoop {{Text}} class contains an array of byte which contain a UTF-8 
encoded string.  However, there is no way to quickly get the length of that 
string.  One can get the number of bytes in the byte array, but to figure out 
the length of the String, it needs to be decoded first.  In this simple 
example, sorting the {{Text}} objects by String length, the String needs to be 
decoded from the byte array repeatedly.  This was brought to my attention based 
on [HIVE-23870].

{code:java}
  public static void main(String[] args) {
    List<Text> list = Arrays.asList(new Text("1"), new Text("22"), new 
Text("333"));
    list.sort((Text t1, Text t2) -> t1.toString().length() - 
t2.toString().length());
  }
{code}


> Add Capability To Get Text Length
> ---------------------------------
>
>                 Key: HADOOP-17141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17141
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>
> The Hadoop {{Text}} class contains an array of byte which contain a UTF-8 
> encoded string.  However, there is no way to quickly get the length of that 
> string.  One can get the number of bytes in the byte array, but to figure out 
> the length of the String, it needs to be decoded first.  In this simple 
> example, sorting the {{Text}} objects by String length, the String needs to 
> be decoded from the byte array repeatedly.  This was brought to my attention 
> based on [HIVE-23870].
> {code:java}
>   public static void main(String[] args) {
>     List<Text> list = Arrays.asList(new Text("1"), new Text("22"), new 
> Text("333"));
>     list.sort((Text t1, Text t2) -> t1.toString().length() - 
> t2.toString().length());
>   }
> {code}
> Also helpful if I want to check the last letter in the {{Text}} object 
> repeatedly:
> {code:java}
>     Text t = new Text("4444");
>     System.out.println(t.charAt(t.toString().length()));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to