［Carbondata-0.2.0-incubating][Issue Report] -- Select statement return error when add String column in where clause

Lu Cao Tue, 13 Dec 2016 18:13:39 -0800

Hi Dev team,
As discussed this afternoon, I've changed back to 0.2.0 version for the
testing. Please ignore the former email about "error when save DF to
carbondata file", that's on master branch.


Spark version: 1.6.0
System: Mac OS X EI Capitan(10.11.6)

[lucao]$ spark-shell --master local[*] --total-executor-cores 2
--executor-memory 1g --num-executors 2 --jars ~/MyDev/hive-1.1.1/lib/mysql-c
onnector-java-5.1.40-bin.jar

In 0.2.0, I can successfully create table and load data into carbondata
table

    scala> cc.sql("create table if not exists default.mycarbon_00001(vin
String, data_date String, work_model Double) stored by 'carbondata'")

    scala> cc.sql("load data inpath'test2.csv' into table
default.mycarbon_00001")

I can successfully run below query:

   scala> cc.sql("select vin, count(*) from default.mycarbon_00001 group by
vin").show

INFO  13-12 17:13:42,215 - Job 5 finished: show at <console>:42, took
0.732793 s

+-----------------+---+

|              vin|_c1|

+-----------------+---+

|LSJW26760ES065247|464|

|LSJW26760GS018559|135|

|LSJW26761ES064611|104|

|LSJW26761FS090787| 45|

|LSJW26762ES051513| 40|

|LSJW26762FS075036|434|

|LSJW26763ES052363| 32|

|LSJW26763FS088491|305|

|LSJW26764ES064859|186|

|LSJW26764FS078696| 40|

|LSJW26765ES058651|171|

|LSJW26765FS072633|191|

|LSJW26765GS056837|467|

|LSJW26766FS070308| 79|

|LSJW26766GS050853|300|

|LSJW26767FS069913|  8|

|LSJW26767GS053454|286|

|LSJW26768FS062811| 16|

|LSJW26768GS051146| 97|

|LSJW26769FS062722|424|

+-----------------+---+

only showing top 20 rows

The error occurred when I add "vin" column into where clause:

scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where
vin='LSJW26760ES065247' group by vin")

+-----------------+---+

|              vin|_c1|

+-----------------+---+

|LSJW26760ES065247|464|

+-----------------+---+

>>> This one is OK... Actually as I tested, the *first two value* in the
top 20 rows usually successed but for most of others it will return error.

For example :

scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where
vin='LSJW26765GS056837' group by vin").show

>>>Log is coming:

<carbontest_lucao_20161213.log>


It is the same error I met at Dec. 6th. As I said in the WeChat Group
before:

       When the data set is 1000 rows, no above error occurred.

       When the data set is 1M rows, some returned error, some didn't.

       When the data set is 1.9 billion, all tests returned error.


*### Attached the sample data set (1M rows) for your reference.*

<<........I sent this email yesterday afternoon but it was rejected by
apache mail server due to larger than 1000000 bytes, so remove the sample
data file from attachment, if you need it please reply your personal email
address........>>

Looking forward to your response.


Thanks & Best Regards,

Lionel

［Carbondata-0.2.0-incubating][Issue Report] -- Select statement return error when add String column in where clause

Reply via email to