[
https://issues.apache.org/jira/browse/HIVE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andraz Tori reopened HIVE-693:
------------------------------
Amazon Changed the format of the logs at the beginning of February 2010, so now
the new regex is:
static Pattern regexpat = Pattern.compile( "(\\S+) (\\S+) \\[(.*?)\\] (\\S+)
(\\S+) (\\S+) (\\S+) (\\S+) \"(.+)\" (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+)
\"(.*)\" \"(.*)\"(?: -)?");
(the only difference is addition of (?: -)? at the end.
Since Amazon hasn't yet documented the last field, I don't know if it is ok to
do a catch-all regex for that field instead of the very specific one I've added.
> Add a AWS S3 log format deserializer
> ------------------------------------
>
> Key: HIVE-693
> URL: https://issues.apache.org/jira/browse/HIVE-693
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Serializers/Deserializers
> Reporter: Zheng Shao
> Assignee: Andraz Tori
> Fix For: 0.5.0
>
> Attachments: HIVE-693.1.patch, HIVE-693.2.patch, inputs3.q, s3.log,
> s3deserializer.diff, S3LogDeserializer.java, S3LogStruct.java
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.