[
https://issues.apache.org/jira/browse/HIVE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273482#comment-13273482
]
Phabricator commented on HIVE-1719:
-----------------------------------
cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out
of hive-contrib and over to hive-serde".
INLINE COMMENTS
contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first
CREATE TABLE statement is expected to fail, we should probably remove the rest
of this code.
contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary.
QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't
defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources()
... I remember you asked about this yesterday).
contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove.
contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a
system variable (set in one of the Ant build files) instead of using a relative
path, e.g. "${system:test.src.data.dir}/files/apache.access.log"
contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative
paths.
ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove.
ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this
code if we're never going to hit it.
ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove.
ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as
clientnegative/serde_regex.q?
ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code.
ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE
default;' if you're running into problems placing a comment on the first line
of the file.
ql/src/test/queries/clientpositive/serde_regex.q:1 Remove.
ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either
roll the clientpositive tests into a single file (my preference), or modify the
qfile file names so that they give some hint as to what is being tested. Please
take a look at clientpositive/database.q for a good example of the former
approach.
ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline.
You might want to configure your editor to add these automatically.
ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous
using 'SELECT *' in tests because it's not always possible to easily determine
from the output which column contains which field. In addition to 'SELECT *',
please also include a query which selects a single column in the middle in of
the table, and a query which selects a subset of two or more columns from the
table.
ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some
more and think we should probably throw a runtime exception in the
deserialize() method if this condition is detected. Otherwise, we're basically
signing up to support this behavior in the future, which is not what we want to
do.
Please file a JIRA for this issue (i.e. that we want to catch this condition
at compile-time instead of runtime), and then move this test case over to the
clientnegative directory and cite the JIRA in a comment. Thanks.
ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that
output.format.string actually did something in the past, which isn't true.
Also, the warning message doesn't show up in the q.out file, so I think we
should probably just skip referencing this property in the tests.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to
update this javadoc. RegexSerDe only provides deserialization capabilities.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting:
Braces always go on their own line. This project basically uses the Sun Java
Coding standard (http://www.oracle.com/technetwork/java/codeconv-138413.html)
with a 100 char line limit and 2 character indent.
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support
yet" is inaccurate since we never plan to support this feature. Please change
this to "RegexSerDe does not support the serialize() method"
serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's
possible that this will get triggered for every row in the input file, which
will quickly overflow the logs. We should log this at most once. Same thing
goes for the log message below.
Currently SerDes don't really have a good way of logging error information
like this. We should talk offline about ways of improving this.
REVISION DETAIL
https://reviews.facebook.net/D3141
To: JIRA, cwsteinbach
Cc: shreepadma
> Move RegexSerDe out of hive-contrib and over to hive-serde
> ----------------------------------------------------------
>
> Key: HIVE-1719
> URL: https://issues.apache.org/jira/browse/HIVE-1719
> Project: Hive
> Issue Type: Task
> Components: Serializers/Deserializers
> Reporter: Carl Steinbach
> Assignee: Shreepadma Venugopalan
> Attachments: HIVE-1719.3.patch.txt, HIVE-1719.D3051.1.patch,
> HIVE-1719.D3051.2.patch, HIVE-1719.D3141.1.patch, HIVE-1719.patch
>
>
> RegexSerDe is as much a part of the standard Hive distribution as the other
> SerDes
> currently in hive-serde. I think we should move it over to the hive-serde
> module so that
> users don't have to go to the added effort of manually registering the
> contrib jar before
> using it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira