Hello Marcel Kornacker, Internal Jenkins, Skye Wanderman-Milne,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/2110
to look at the new patch set (#23).
Change subject: IMPALA-1740: Add support for skip.header.line.count.
......................................................................
IMPALA-1740: Add support for skip.header.line.count.
HIVE-5795 introduced a parameter skip.header.line.count to skip header
lines from input files. This change introduces the capability to skip
an arbitrary number of header lines from csv input files on hdfs. The
size of the total file header must be smaller than
max_scan_range_length, otherwise an error will be reported. This is
necessary because scan ranges are not read in disk order, so there is
no way of identifying header lines except by counting from the start
of the first scan range.
[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='1');
Query: alter table t1 set tblproperties('skip.header.line.count'='1')
[localhost:21000] > select * from t1;
Query: select * from t1
+----+----+
| c1 | c2 |
+----+----+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+----+
Fetched 3 row(s) in 0.32s
[localhost:21000] > alter table t1 set
tblproperties('skip.header.line.count'='0');
Query: alter table t1 set tblproperties('skip.header.line.count'='0')
[localhost:21000] > select * from t1;
Query: select * from t1
+------+------+
| c1 | c2 |
+------+------+
| NULL | NULL |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+------+------+
WARNINGS: Error converting column: 0 TO INT (Data is: num1)
Error converting column: 1 TO DOUBLE (Data is: num2)
file: hdfs://localhost:20500/test-warehouse/t1/test.txt
record: num1,num2
Fetched 4 row(s) in 0.41s
Change-Id: I595f01a165d41499ca1956fe748ba3840a6eb543
---
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/com/cloudera/impala/analysis/BaseTableRef.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/SingleNodePlanner.java
A testdata/data/table_with_header.csv
A testdata/data/table_with_header_2.csv
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/create.test
A
testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_scanners.py
25 files changed, 456 insertions(+), 20 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/10/2110/23
--
To view, visit http://gerrit.cloudera.org:8080/2110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I595f01a165d41499ca1956fe748ba3840a6eb543
Gerrit-PatchSet: 23
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>