[ 
https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681573#comment-13681573
 ] 

Samuel Yuan commented on HIVE-4044:
-----------------------------------

I tried breaking the URL into parts and encoding them as individual columns; 
the dictionary shrunk, but the overhead of the other ORC columns introduced 
(mostly the column of indices) made a bigger impact, so compression was 
actually worse overall. I also tried storing the query string as a map and 
putting common keys into separate columns; this improved compression somewhat, 
but still not enough to offset the overhead of new columns for the query string.
                
> Add URL type
> ------------
>
>                 Key: HIVE-4044
>                 URL: https://issues.apache.org/jira/browse/HIVE-4044
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Samuel Yuan
>            Assignee: Samuel Yuan
>         Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch
>
>
> Having a separate type for URLs would enable improvements in storage 
> efficiency based on breaking up a URL into its components. The new type will 
> be named "URL" and made a non-reserved keyword (see HIVE-701).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to