[ https://issues.apache.org/jira/browse/PIG-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-4340: ---------------------------- Component/s: impl Fix Version/s: 0.15.0 Assignee: Daniel Dai > PigStorage fails parsing empty map. > ----------------------------------- > > Key: PIG-4340 > URL: https://issues.apache.org/jira/browse/PIG-4340 > Project: Pig > Issue Type: Bug > Components: impl > Reporter: Akira Murashita > Assignee: Daniel Dai > Priority: Minor > Fix For: 0.15.0 > > > I've found that PigStorage doesn't parse empty maps properly. > I'm using pig-0.11.0-cdh4.4.0, but reading the source code, it would be > reproduced in the later versions. > An empty map in a field of a tuple is parsed as null. > {code:title=test.txt} > empty [] > nonempty [foo#bar] > {code} > {code:title=test.pig} > A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, > b:map[chararray]); > DUMP A; > {code} > {code} > $ pig test.pig > ... > (empty,) > (nonempty,[foo#bar]) > {code} > Moreover, if the empty map is nested in a parent field, the entire field is > interpreted as null. > {code:title=test-nested.txt} > empty (f1,[]) > nonempty (f1,[foo#bar]) > {code} > {code:title=test.pig} > A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, (b:chararray, > b:map[chararray])); > DUMP A; > {code} > {code} > $ pig test.pig > ... > (empty,) > (nonempty,(f1,[foo#bar])) > {code} > Investigating this, I've found it is because > {{Utf8StorageConverter#consumeMap}} throws {{IOException}} when it receives > empty map as string '[]'. It seems like always assuming there should be a > content of map, more specifically '#' character. > {code:java} > private Map<String, Object> consumeMap(PushbackInputStream in, > ResourceFieldSchema fieldSchema) throws IOException { > int buf; > > while ((buf=in.read())!='[') { > if (buf==-1) { > throw new IOException("Unexpect end of map"); > } > } > HashMap<String, Object> m = new HashMap<String, Object>(); > ByteArrayOutputStream mOut = new ByteArrayOutputStream(BUFFER_SIZE); > while (true) { > // Read key (assume key can not contains special character such > as #, (, [, {, }, ], ) > while ((buf=in.read())!='#') { > if (buf==-1) { > throw new IOException("Unexpect end of map"); > } > mOut.write(buf); > } > String key = bytesToCharArray(mOut.toByteArray()); > if (key.length()==0) > throw new IOException("Map key can not be null"); > {code} > I would appreciate if you could fix this problem. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)