Hi, I just uploaded the data file to Baidu: 链接: https://pan.baidu.com/s/1slERWL3 密码: m7kj
Thanks, Lionel On Wed, Dec 14, 2016 at 10:12 AM, Lu Cao <[email protected]> wrote: > Hi Dev team, > As discussed this afternoon, I've changed back to 0.2.0 version for the > testing. Please ignore the former email about "error when save DF to > carbondata file", that's on master branch. > > Spark version: 1.6.0 > System: Mac OS X EI Capitan(10.11.6) > > [lucao]$ spark-shell --master local[*] --total-executor-cores 2 > --executor-memory 1g --num-executors 2 --jars ~/MyDev/hive-1.1.1/lib/mysql-c > onnector-java-5.1.40-bin.jar > > In 0.2.0, I can successfully create table and load data into carbondata > table > > scala> cc.sql("create table if not exists default.mycarbon_00001(vin > String, data_date String, work_model Double) stored by 'carbondata'") > > scala> cc.sql("load data inpath'test2.csv' into table > default.mycarbon_00001") > > I can successfully run below query: > > scala> cc.sql("select vin, count(*) from default.mycarbon_00001 group > by vin").show > > INFO 13-12 17:13:42,215 - Job 5 finished: show at <console>:42, took > 0.732793 s > > +-----------------+---+ > > | vin|_c1| > > +-----------------+---+ > > |LSJW26760ES065247|464| > > |LSJW26760GS018559|135| > > |LSJW26761ES064611|104| > > |LSJW26761FS090787| 45| > > |LSJW26762ES051513| 40| > > |LSJW26762FS075036|434| > > |LSJW26763ES052363| 32| > > |LSJW26763FS088491|305| > > |LSJW26764ES064859|186| > > |LSJW26764FS078696| 40| > > |LSJW26765ES058651|171| > > |LSJW26765FS072633|191| > > |LSJW26765GS056837|467| > > |LSJW26766FS070308| 79| > > |LSJW26766GS050853|300| > > |LSJW26767FS069913| 8| > > |LSJW26767GS053454|286| > > |LSJW26768FS062811| 16| > > |LSJW26768GS051146| 97| > > |LSJW26769FS062722|424| > > +-----------------+---+ > > only showing top 20 rows > > The error occurred when I add "vin" column into where clause: > > scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where > vin='LSJW26760ES065247' group by vin") > > +-----------------+---+ > > | vin|_c1| > > +-----------------+---+ > > |LSJW26760ES065247|464| > > +-----------------+---+ > > >>> This one is OK... Actually as I tested, the *first two value* in the > top 20 rows usually successed but for most of others it will return error. > > For example : > > scala> cc.sql("select vin, count(*) from default.mycarbon_00001 where > vin='LSJW26765GS056837' group by vin").show > > >>>Log is coming: > > <carbontest_lucao_20161213.log> > > > It is the same error I met at Dec. 6th. As I said in the WeChat Group > before: > > When the data set is 1000 rows, no above error occurred. > > When the data set is 1M rows, some returned error, some didn't. > > When the data set is 1.9 billion, all tests returned error. > > > *### Attached the sample data set (1M rows) for your reference.* > > <<........I sent this email yesterday afternoon but it was rejected by > apache mail server due to larger than 1000000 bytes, so remove the sample > data file from attachment, if you need it please reply your personal email > address........>> > > Looking forward to your response. > > > Thanks & Best Regards, > > Lionel >
