Hello, Jim Apple, I have test original delimiters setting, and logs show's the
difference of my commit and original setting, as below:
My commit
Original setting
Field terminators can't be an empty string. All
terminators can't be empty. (I will enhance restriction to this in my next
patch)
Tuple delimiter can't be the first byte of field delimiter Field
delimiter and line delimiter can't be the same value(So these two restrictions
are actually the same one)
Escaped char can't be the first byte of field delimiter Warning:
Escaped char will be ignored(I will relax my restriction to this in my next
patch)
No restriction for escaped char and line terminator Warning:
Escaped char will be ignored(I will add this warning in my next patch)
Terminator contains '\0'
ImpalaRuntimeException(logs for detail. I add this restriction to fix this
runtime exception.)
Detail logs:
Terminator is an empty string
[nobida147:21000] > create table field_null(id int) row format delimited fields
terminated by "";
Query: create table field_null(id int) row format delimited fields terminated
by ""
Query submitted at: 2016-07-25 10:20:41 (Coordinator: http://0.0.0.0:25000)
ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be
specified as a single character or as a decimal value in the range [-128:127]:
[nobida147:21000] > create table line_null(id int) row format delimited lines
terminated by "";
Query: create table line_null(id int) row format delimited lines terminated by
""
Query submitted at: 2016-07-25 10:20:54 (Coordinator: http://0.0.0.0:25000)
ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be
specified as a single character or as a decimal value in the range [-128:127]:
[nobida147:21000] > create table escape_null(id int) row format delimited
escaped by "";
Query: create table escape_null(id int) row format delimited escaped by ""
Query submitted at: 2016-07-25 10:21:13 (Coordinator: http://0.0.0.0:25000)
ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be
specified as a single character or as a decimal value in the range [-128:127]:
Field delimiter and line delimiter have same value
[nobida147:21000] > create table line_equal_field(id int) row format delimited
fields terminated by "," lines terminated by ",";
Query: create table line_equal_field(id int) row format delimited fields
terminated by "," lines terminated by ","
Query submitted at: 2016-07-25 10:23:45 (Coordinator: http://0.0.0.0:25000)
ERROR: AnalysisException: Field delimiter and line delimiter have same value:
byte 44
Field delimiter and escape character have same value
[nobida147:21000] > create table escape_equal_field(id int) row format
delimited fields terminated by "," escaped by ",";
Query: create table escape_equal_field(id int) row format delimited fields
terminated by "," escaped by ","
Query submitted at: 2016-07-25 10:22:48 (Coordinator: http://0.0.0.0:25000)
Query progress can be monitored at:
http://0.0.0.0:25000/query_plan?query_id=924c6b616e183f62:7c4779a423b29d96
++
||
++
++
WARNINGS: Field delimiter and escape character have same value: byte 44. Escape
character will be ignored
Fetched 0 row(s) in 0.16s
Line delimiter and escape character have same value
[nobida147:21000] > create table escape_equal_line(id int) row format delimited
escaped by "," lines terminated by ',';
Query: create table escape_equal_line(id int) row format delimited escaped by
"," lines terminated by ','
Query submitted at: 2016-07-25 10:23:21 (Coordinator: http://0.0.0.0:25000)
Query progress can be monitored at:
http://0.0.0.0:25000/query_plan?query_id=f443df31f58860bb:1c01f402050f35b3
++
||
++
++
WARNINGS: Line delimiter and escape character have same value: byte 44. Escape
character will be ignored
Fetched 0 row(s) in 0.13s
Delimiter contains '\0'
[nobida147:21000] > create table contains_zero(id int) row format delimited
fields terminated by "\0";
Query: create table contains_zero(id int) row format delimited fields
terminated by "\0"
Query submitted at: 2016-07-25 10:08:39 (Coordinator: http://0.0.0.0:25000)
ERROR:
ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: javax.jdo.JDODataStoreException: Put request failed :
INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES
(?,?,?)
at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:902)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy0.createTable(Unknown Source)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1469)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1502)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:138)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
. . .
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
encoding "UTF8": 0x00
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
at
com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
at
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
at
org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
at
org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1069)
... 70 more
In conclusion, there's no difference with the current restrictions on field
terminators.(In my next patch, I will just inherit current restrictions and add
one that delimiters can't contains '\0' to fix ImpalaRuntimeException as above
log shows.
------------------ ???????? ------------------
??????: "jbapple";<[email protected]>;
????????: 2016??7??24??(??????) ????9:20
??????: "Yuanhao Luo"<[email protected]>;
????: "dev@impala"<[email protected]>;
????: Re: IMPALA-2428 Support multiple-character string as the field delimiter
We must be very careful about breaking changes. We may want to put
this change in Impala 3.0, rather than 2.x, if it breaks existing DDL
statements.
> Field terminator can't be an empty string
How is this different that the current restrictions on field terminators?
If field terminators can currently be empty strings, what kind of
queries or DDL statements does this break?
Do we currently have tests for those? Do we expect that many users are
using them?
These questions are also of interest to me on your three other restrictions.