[ 
https://issues.apache.org/jira/browse/HADOOP-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628843#action_12628843
 ] 

Ashish Thusoo commented on HADOOP-4085:
---------------------------------------

Comments are below. The most major one is about how we are treating character 
set name in the grammar. Ideally we would want this to an identifier instead of 
token (similar to table name identifiers). With that approach we would be able 
to support any kinds of character sets very easily.

Inline Comments:
cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:85: nitpick - Can we 
follow the convention of having the opening brace on the same line as the code.
ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:781: Instead of having fixed 
tokens per character set in the grammar, we should define a character-set 
identifier and pass that across to the java calls. That is much more scalable 
and would get us to seamlessly be able to support any character sets supported 
by the java run time.

 http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html 

has information on what can be grammar rules to determine the character set 
name and how new charactersets can be added to the JVM by CharactersetProvider. 
So the rule for the character set could look something like

 charSetStringLiteral : charSetIdentifier StringLiteral charSetIdentifier can 
be defined in terms of the rules mentioned in the link above.

ql/src/test/queries/clientpositive/inputddl4.q:0: Lets put a brief comment in 
this describing what this actually tests.
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:157: nitpick - maybe we 
should call this PREFIX and not SAME
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:143: Should this not 
check across all sort columns instead of bucket columns? Is this a bug?
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:384: This function 
hardcodes the terminating character and the field delimiters while in the 
current code these are parameterized which is better as later we want to drive 
them through session level properties.

> internationalization support and sort order (ascedning/descending) support in 
> create table
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4085
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1
>
>
> User cannot specify utf8 strings in the query, both for selection and 
> filtering. Mysql syntax should be followed: 
> select _utf8 'string' from <TableName>
> select <selectExpr> from <TableName> where col = _utf8 0x<HexValue>
> To start with, utf8 strings should be supported. Support for other character 
> sets can be added in the future on demand.
> The identifiers (table name/column name etc.) cannot be utf8 strings, it is 
> only for the data values.
> Although, in create table, the user has the option of specifying sorted 
> columns, he does not have the option of specifying whether they are ascending 
> or descending.
> Create Table syntax should be enhanced to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to