[
https://issues.apache.org/jira/browse/HIVE-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651590#action_12651590
]
David Phillips commented on HIVE-40:
------------------------------------
I see you actually wrote those examples, so I'm probably misunderstanding the
problem.
It seems the issue here is that Apache logs have three quote characters (double
quote, left bracket, right bracket):
{noformat}1.2.3.4 - - [26/Nov/2008:17:59:27 -0600] "HEAD / HTTP/1.0" 302 - "-"
"-"{noformat}
Specifying these characters as a regex is messy due to multiple levels of
quoting. We could have a parameter that takes one or more single characters
(rather than a regex):
{noformat}'quote.chars' = '"[]'{noformat}
> Hive Deserializer for plain text with separators simple support for quoting
> ----------------------------------------------------------------------------
>
> Key: HIVE-40
> URL: https://issues.apache.org/jira/browse/HIVE-40
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: Pete Wyckoff
>
> Hive does currently support things like Apache log format where the separator
> is " " but strings are quoted. But, to do this, the field separator specified
> on the command line has to be horrific.
> TCTLSeparatedProtocol could take another parameter QuoteCharacter and then
> when this is set, respect quoting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.