Hyunsik Choi created TAJO-806:
---------------------------------

             Summary: CreateTableNode in CTAS has a wrong schema as output 
schema and table shcmea.
                 Key: TAJO-806
                 URL: https://issues.apache.org/jira/browse/TAJO-806
             Project: Tajo
          Issue Type: Bug
          Components: planner/optimizer, storage
            Reporter: Hyunsik Choi
            Assignee: Hyunsik Choi
             Fix For: 0.9.0, 0.8.1


In below case, currently, TajoWriteSupport just takes the schema of the table 
{{orders}}. In other words, each column qualifier was {{default.orders}} 
instead of {{default.parquet_test}}. This is a bug. In such a case, we can meet 
the following error when we read parquet files.

{noformat}
default> create table parquet_test using parquet as select * from orders;
Progress: 0%, response time: 1.119 sec
Progress: 0%, response time: 2.121 sec
Progress: 0%, response time: 3.123 sec
Progress: 83%, response time: 4.126 sec
Progress: 100%, response time: 4.709 sec
(1500000 rows, 4.709 sec, 109.9 MiB inserted)

default> select * from parquet_test;
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Exception in thread "main" java.lang.NullPointerException
        at 
parquet.hadoop.InternalParquetRecordReader.close(InternalParquetRecordReader.java:118)
        at parquet.hadoop.ParquetReader.close(ParquetReader.java:144)
        at 
org.apache.tajo.storage.parquet.ParquetScanner.close(ParquetScanner.java:87)
        at org.apache.tajo.storage.MergeScanner.close(MergeScanner.java:137)
        at org.apache.tajo.jdbc.TajoResultSet.close(TajoResultSet.java:153)
        at org.apache.tajo.cli.TajoCli.localQueryCompleted(TajoCli.java:387)
        at org.apache.tajo.cli.TajoCli.executeQuery(TajoCli.java:365)
        at org.apache.tajo.cli.TajoCli.executeParsedResults(TajoCli.java:322)
        at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:311)
        at org.apache.tajo.cli.TajoCli.main(TajoCli.java:490)
Apr 30, 2014 11:04:01 AM INFO: parquet.hadoop.ParquetFileReader: reading 
another 1 footers
{noformat}

The patch fixes the bug where CreateTableNode takes the wrong schema.

In addition, I found the potential problem where ParquetFile stores the Tajo 
Schema into its extra meta data. I think that it will problem when users 
renames its database name or table name. So, I removed the code to insert a 
Tajo schema into extra metadata and I changed Parquet reading to not use extra 
metadata.

Tajo mainly uses Catalog system to manage schemas, and reading parquet files in 
Tajo depends on Tajo catalog. So, it will work well. Also, other systems can 
access parquet files by directly reading parquet's native schema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to