[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767894#comment-13767894 ]
Michael Haeusler commented on HIVE-4943: ---------------------------------------- An awesome feature of Hive is the rich type system with excellent support for complex data-structures. To me, this ticket seems like a very useful extension to the hive built-ins. It is especially helpful for those users, that use complex data-structures. Right now, queries are often cumbersome when you access denormalized or nested data. E.g., let's consider a table that contains products together with their most popular accessories (cross-sellings). The order of the cross-selling products matter: {code:javascript} { "productId": 42, "name": "most awesome mp3 player", "manufacturer": "acme corp", "accessories": [ { "productId" : 23, "name": "batteries", "manufacturer": "acme corp" }, { "productId" : 25, "name": "extra load earphones", "manufacturer": "noisemakers inc" } ] } {code} Let's assume we want to know the average position in cross-sellings of the manufacturer "noisemakers inc". Surprisingly, this is not possible with hive built-ins. You could try to come up with a custom UDFSequence and a query like this: {code:sql} SELECT AVG(SEQUENCE(p.productId)) AS wrongAverage FROM products p LATERAL VIEW EXPLODE(p.accessories) pa AS accessory WHERE pa.accessory.manufacturer = 'noisemakers inc'; {code} Unfortunately, the above query will give us wrong results, because Hive executes the predicate in the where clause first. Therefore, any UDF in the select clause has no chance to see and count all values. Using the UDTF from this ticket seems to be the best solution: {code:sql} SELECT AVG(pa.pos) AS correctAverage FROM products p LATERAL VIEW POSEXPLODE(p.accessories) pa AS pos, accessory WHERE pa.accessory.manufacturer = 'noisemakers inc'; {code} > An explode function that includes the item's position in the array > ------------------------------------------------------------------ > > Key: HIVE-4943 > URL: https://issues.apache.org/jira/browse/HIVE-4943 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Affects Versions: 0.11.0 > Reporter: Niko Stahl > Labels: patch > Fix For: 0.11.0 > > Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > A function that explodes an array and includes an output column with the > position of each item in the original array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira