[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257688#comment-14257688 ]
Yongzhi Chen commented on HIVE-9201: ------------------------------------ Three rows are returned because hadoop method org.apache.hadoop.mapred.LineRecordReader.readDefaultLine use \r and \n as line terminator. So hive need to process the \r and \n chars before call the method. Map job uses LazyUtils.writeEscaped method to escape special chars (such as control characters). The method just blindly add escape chars before the chars needing escaped. There are two issues: first \r and \n not in the chars needed to be escaped. second, even they are added, they should be escaped differently: for just adding escape char (such as \ ) before them can not solve our problem, the char with value 13 and 10 still in the stream. So we should process the two chars differently. For example replace '\r' with two chars: escape char and char 'r' . These logic can be add in the LazyUtils.writeEscaped method. The processed stream can go through org.apache.hadoop.mapred.LineRecordReader.readDefaultLine method without logic error(such errors as one row becomes 3 rows). Then in LazyString.init method, when we remove the escape chars, we know convert '\' '\r' to char 13. Attach the fix patch. > Lazy functions do not handle newlines and carriage returns properly > ------------------------------------------------------------------- > > Key: HIVE-9201 > URL: https://issues.apache.org/jira/browse/HIVE-9201 > Project: Hive > Issue Type: Bug > Reporter: Yongzhi Chen > Assignee: Yongzhi Chen > > Hive returns wrong result when returning string has char \r or \n in it. > This happens when the query can trigger mapreduce jobs. > For example, for a table named strsim with only one row: > As shown following, query 1 returns 1 row while query 2 returns 3 rows. > Query 1: > select "abc", narray from strsim LATERAL VIEW explode(array(1)) C AS narray; > Query 2: > select "a\rb\nc", narray from strsim LATERAL VIEW explode(array(1)) C AS > narray; > select "abc", narray from strsim LATERAL VIEW e > xplode(array(1)) C AS narray; > INFO : Number of reduce tasks is set to 0 since there's no reduce operator > INFO : Job running in-process (local Hadoop) > INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% > INFO : Ended Job = job_local1178499218_0015 > +------+---------+--+ > 1 row selected (1.283 seconds) > | _c0 | narray | > +------+---------+--+ > | abc | 1 | > +------+---------+--+ > select "a\rb\nc", narray from strsim LATERAL VI > EW explode(array(1)) C AS narray; > INFO : Number of reduce tasks is set to 0 since there's no reduce operator > INFO : Job running in-process (local Hadoop) > INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% > INFO : Ended Job = job_local1816711099_0016 > +------+---------+--+ > 3 rows selected (1.135 seconds) > | _c0 | narray | > +------+---------+--+ > | a | NULL | > | b | NULL | > | c | 1 | > +------+---------+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)