[
https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585177#comment-13585177
]
Ashutosh Chauhan commented on HIVE-4044:
----------------------------------------
URL is an unusual type to add in query processing engines. Can you spec out
whats the motivation of adding this type (e.g. you can always use string type
for urls). I am assuming from your description above that it might result in
storage efficiency by having better encoding of urls. But, I see in
LazyBinaryURL following comment
/**
* The serialization of LazyBinaryURL is the same as the binary representation
* of the underlying string
*/
and also URLWritable has
{code}
@Override
public void write(DataOutput out) throws IOException {
if (url != null) {
byte[] bytes = url.toString().getBytes();
WritableUtils.writeVInt(out, bytes.length);
out.write(bytes);
} else {
WritableUtils.writeVInt(out, 0);
}
}
{code}
So, it seems like you are storing urls as string anyways both for intermediate
data of MR as well as output of query. So, I don't see how is it resulting in
better storage efficiency.
> Add URL type
> ------------
>
> Key: HIVE-4044
> URL: https://issues.apache.org/jira/browse/HIVE-4044
> Project: Hive
> Issue Type: Improvement
> Reporter: Samuel Yuan
> Assignee: Samuel Yuan
> Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch
>
>
> Having a separate type for URLs would enable improvements in storage
> efficiency based on breaking up a URL into its components. The new type will
> be named "URL" and made a non-reserved keyword (see HIVE-701).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira