Akira Murashita created PIG-4340:
------------------------------------
Summary: PigStorage fails parsing empty map.
Key: PIG-4340
URL: https://issues.apache.org/jira/browse/PIG-4340
Project: Pig
Issue Type: Bug
Reporter: Akira Murashita
Priority: Minor
I've found that PigStorage doesn't parse empty maps properly.
I'm using pig-0.11.0-cdh4.4.0, but reading the source code, it would be
reproduced in the later versions.
An empty map in a field of a tuple is parsed as null.
{code:title=test.txt}
empty []
nonempty [foo#bar]
{code}
{code:title=test.pig}
A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray,
b:map[chararray]);
DUMP A;
{code}
{code}
$ pig test.pig
...
(empty,)
(nonempty,[foo#bar])
{code}
Moreover, if the empty map is nested in a parent field, the entire field is
interpreted as null.
{code:title=test-nested.txt}
empty (f1,[])
nonempty (f1,[foo#bar])
{code}
{code:title=test.pig}
A = LOAD '/tmp/test.txt' USING PigStorage(' ') AS (a:chararray, (b:chararray,
b:map[chararray]));
DUMP A;
{code}
{code}
$ pig test.pig
...
(empty,)
(nonempty,(f1,[foo#bar]))
{code}
Investigating this, I've found it is because
{{Utf8StorageConverter#consumeMap}} throws {{IOException}} when it receives
empty map as string '[]'. It seems like always assuming there should be a
content of map, more specifically '#' character.
{code:java}
private Map<String, Object> consumeMap(PushbackInputStream in,
ResourceFieldSchema fieldSchema) throws IOException {
int buf;
while ((buf=in.read())!='[') {
if (buf==-1) {
throw new IOException("Unexpect end of map");
}
}
HashMap<String, Object> m = new HashMap<String, Object>();
ByteArrayOutputStream mOut = new ByteArrayOutputStream(BUFFER_SIZE);
while (true) {
// Read key (assume key can not contains special character such as
#, (, [, {, }, ], )
while ((buf=in.read())!='#') {
if (buf==-1) {
throw new IOException("Unexpect end of map");
}
mOut.write(buf);
}
String key = bytesToCharArray(mOut.toByteArray());
if (key.length()==0)
throw new IOException("Map key can not be null");
{code}
I would appreciate if you could fix this problem.
Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)