pvary commented on pull request #3392:
URL: https://github.com/apache/iceberg/pull/3392#issuecomment-957774548
@openinx: Seems more like a Hive bug to me. Working with Hive master BTW.
Simplified your use-case to this (no partitions):
```
@Test
public void testBug() {
shell.setHiveSessionValue("hive.cbo.enable", true);
String engine_config = "CREATE TABLE engine_config (\n" +
" application_id STRING,\n" +
" config STRING\n" +
" )\n" +
" STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'";
shell.executeStatement(engine_config);
shell.executeStatement("INSERT INTO engine_config VALUES ('APP-1',
'c0')");
String process_info = "CREATE TABLE process_info (\n" +
" application_id STRING,\n" +
" engine STRING,\n" +
" node_type STRING,\n" +
" bytes_read BIGINT,\n" +
" bytes_write BIGINT,\n" +
" fd_count_avg BIGINT,\n" +
" thread_count_avg INT\n" +
" )\n" +
" STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'";
shell.executeStatement(process_info);
shell.executeStatement(
"INSERT INTO process_info VALUES('APP-1','map', 'MR', 0, 0, 0,
0)");
String mr_job_info = "CREATE TABLE mr_job_info (\n" +
" application_id STRING,\n" +
" queue STRING,\n" +
" user_name STRING\n" +
" )\n" +
" STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'";
shell.executeStatement(mr_job_info);
shell.executeStatement("INSERT INTO mr_job_info VALUES
('APP-1','QUEUE-1', 'openinx')");
String mr_task_info = "CREATE TABLE mr_task_info (\n" +
" application_id STRING,\n" +
" elapse_time STRING\n" +
" )\n" +
" STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'";
shell.executeStatement(mr_task_info);
shell.executeStatement("INSERT INTO mr_task_info VALUES
('APP-1','NaN')");
String query = "SELECT\n" +
" j.user_name,\n" +
" r.bytes_read\n" +
"FROM\n" +
" process_info r\n" +
" JOIN engine_config c ON \n" +
" r.application_id = c.application_id\n" +
" JOIN mr_job_info j ON \n" +
" j.application_id = c.application_id\n";
shell.executeStatement(query);
}
```
Seems like when reading the `SelectOperator` only tires to read the
`application_id` column. When I was debugging the code, I have found that when
we calculate the columns at #2052 the values are correct. We lose the data
somewhere later.
I will need some more time to find the problem.
Thanks for the good repro case!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]