[
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542841#comment-16542841
]
Zoltan Haindrich commented on HIVE-19943:
-----------------------------------------
I'm not sure how this supposed to be fixed; exploring to add these as
inputformat args is a dead end because the actual reader is some kind of
"linereader" from hadoop...
I feel that this "HiveRecordReader" should somehow be pushed under the
llaprecordreader somehow...but that seems like a hard thing to do (and probably
not the right move)...
[~sershe] do you have any suggestion?
To reproduce, patching an "existing test" which by mistake only tested the
local mode...so it missed this issue all along... (and run it with
TestMiniLlapCliDriver)
{code}
diff --git ql/src/test/queries/clientpositive/file_with_header_footer.q
ql/src/test/queries/clientpositive/file_with_header_footer.q
index 8913e54ad0..5dddcaba2a 100644
--- ql/src/test/queries/clientpositive/file_with_header_footer.q
+++ ql/src/test/queries/clientpositive/file_with_header_footer.q
@@ -11,6 +11,10 @@ CREATE EXTERNAL TABLE header_footer_table_1 (name string,
message string, id int
SELECT * FROM header_footer_table_1;
+explain
+SELECT count(distinct name) FROM header_footer_table_1;
+SELECT assert_true(count(distinct name)=11) FROM header_footer_table_1;
+
SELECT * FROM header_footer_table_1 WHERE id < 50;
CREATE EXTERNAL TABLE header_footer_table_2 (name string, message string, id
int) PARTITIONED BY (year int, month int, day int) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t' tblproperties ("skip.header.line.count"="1",
"skip.footer.line.count"="2");
{code}
> Header values keep showing up in result sets
> --------------------------------------------
>
> Key: HIVE-19943
> URL: https://issues.apache.org/jira/browse/HIVE-19943
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 2.1.0
> Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
> Reporter: Liam De Lee
> Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating
> an external table.
> When we do a select * from table we get it back as expected without the
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this
> count (tested with a select * from table and paste it in notepad to find the
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> -----------------------------------
> --test_type--
> -----------------------------------
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
> (
> test_type string
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
> Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to
> test.
>
> I can also tell you it is not just at our HDInsight but also at another
> company we are working for. It does not Mather what is in the data as well.
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)