[ https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281070#comment-13281070 ]
Jonathan Coveney commented on PIG-2632: --------------------------------------- Hmm, so I was testing my benchmarks, and the Varint/varlong CPU cost is higher than the benchmark was capturing. For large longs, the it can be even 3-4x slower (this came out of work for PIG-2638, and in that case I came up with a method that should give the same benefit and be more performant, but it won't apply to this case). I may just switch to simple "store the whole long" and hope intermediate compression is turned on and effective, but that seems unsatisfying to me. Will ruminate on that. Perhaps this is the part where Scott says Pig should use Avro for the intermediate serialization again :) > Create a SchemaTuple which generates efficient Tuples via code gen > ------------------------------------------------------------------ > > Key: PIG-2632 > URL: https://issues.apache.org/jira/browse/PIG-2632 > Project: Pig > Issue Type: Improvement > Reporter: Jonathan Coveney > Assignee: Jonathan Coveney > Fix For: 0.11 > > Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-3.patch, > schematuple benchmarking.pptx > > > This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing > the Schema on the frontend, we can code generate Tuples which can be used for > fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, > and it's ~15% smaller serialized (heavily heavily depends on the data, > though). Need to do get/set tests, but assuming that it's on par (or even > faster) than Tuple, the memory gain is huge. > Need to clean up the code and add tests. > Right now, it generates a SchemaTuple for every inputSchema and outputSchema > given to UDF's. The next step is to make a SchemaBag, where I think the > serialization savings will be really huge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira