[ 
https://issues.apache.org/jira/browse/CARBONDATA-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1345.
-----------------------------------------
    Resolution: Fixed

> outdated tablemeta cache cause operation failed in multiple session
> -------------------------------------------------------------------
>
>                 Key: CARBONDATA-1345
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1345
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Minor
>             Fix For: 1.2.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> # Scenario
> ## Steps to reproduce
> Start 2 spark-beeline as two different sessions, do the following steps in 
> corresponding session:
> (SESSION1)
> 1. create table T_Carbn01(Active_status String,Item_type_cd INT,Qty_day_avg 
> INT,Qty_total INT,Sell_price BIGINT,Sell_pricep DOUBLE,Discount_price 
> DOUBLE,Profit DECIMAL(3,2),Item_code String,Item_name String,Outlet_name 
> String,Update_time TIMESTAMP,Create_date String)STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='128');
> 2. LOAD DATA INPATH 'hdfs://hacluster/user/Ram/T_Hive1.csv' INTO table 
> T_Carbn01 options ('DELIMITER'=',', 
> 'QUOTECHAR'='\','BAD_RECORDS_LOGGER_ENABLE'='true', 
> 'BAD_RECORDS_ACTION'='REDIRECT', 
> 'FILEHEADER'='Active_status,Item_type_cd,Qty_day_avg,Qty_total,Sell_price,Sell_pricep,Discount_price,Profit,Item_code,Item_name,Outlet_name,Update_time,Create_date');
> (SESSION2):
> 1. update t_carbn01 set(Active_status) = ('TRUE') where Item_type_cd = 41;
> (SESSION1):
> 1. Drop table t_carbn01;
> 2. create table T_Carbn01(Active_status String,Item_type_cd INT,Qty_day_avg 
> INT,Qty_total INT,Sell_price BIGINT,Sell_pricep DOUBLE,Discount_price 
> DOUBLE,Profit DECIMAL(3,2),Item_code String,Item_name String,Outlet_name 
> String,Update_time TIMESTAMP,Create_date String)STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='128');
> 3. LOAD DATA INPATH 'hdfs://hacluster/user/Ram/T_Hive1.csv' INTO table 
> T_Carbn01 options ('DELIMITER'=',', 
> 'QUOTECHAR'='\','BAD_RECORDS_LOGGER_ENABLE'='true', 
> 'BAD_RECORDS_ACTION'='REDIRECT', 
> 'FILEHEADER'='Active_status,Item_type_cd,Qty_day_avg,Qty_total,Sell_price,Sell_pricep,Discount_price,Profit,Item_code,Item_name,Outlet_name,Update_time,Create_date');
> (SESSION2):
> 1. update t_carbn01 set(Active_status) = ('TRUE') where Item_type_cd = 41;
> ## Outputs
> message are as below:
> ```
> Error: java.lang.RuntimeException: Update operation failed. Job aborted due 
> to stage failure: Task 0 in stage 14.0 failed 4 times, most recent failure: 
> Lost task 0.3 in stage 14.0 (TID 29, master, executor 2): 
> java.io.IOException: java.io.IOException: Dictionary file does not exist: 
> hdfs://user/hive/warehouse/carbon.store/default/t_carbn01/Metadata/ddfb3bc8-2fea-41fe-a4ff-18588df41aec.dictmeta
>     at 
> org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.getAll(ForwardDictionaryCache.java:146)
>     at 
> org.apache.spark.sql.DictionaryLoader.loadDictionary(CarbonDictionaryDecoder.scala:686)
>     at 
> org.apache.spark.sql.DictionaryLoader.getDictionary(CarbonDictionaryDecoder.scala:703)
>     at 
> org.apache.spark.sql.ForwardDictionaryWrapper.getDictionaryValueForKeyInBytes(CarbonDictionaryDecoder.scala:654)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>     at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:378)
>     at 
> org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:132)
>     at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
>     at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1041)
>     at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1032)
>     at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:972)
>     at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1032)
>     at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:715)
>     at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> ```
> ## Input data
> sample for input data:
> ```
> TRUE,2,423,3046340,200000000003454300, 
> 121.5,4.99,2.44,SE3423ee,asfdsffdfg,EtryTRWT,2012-01-12 
> 03:14:05.123456729,2012-01-20
> TRUE,3,453,3003445,200000000000003450, 
> 121.5,4.99,2.44,SE3423ee,asfdsffdfg,ERTEerWT,2012-01-13 
> 03:24:05.123456739,2012-01-20
> TRUE,4,4350,3044364,200000000000000000, 
> 121.5,4.99,2.44,SE3423ee,asfdsffdfg,ERTtryWT,2012-01-14 
> 23:03:05.123456749,2012-01-20
> TRUE,114,4520,30000430,200000000004300000, 
> 121.5,4.99,2.44,RE3423ee,asfdsffdfg,4RTETRWT,2012-01-01 
> 23:02:05.123456819,2012-01-20
> FALSE,123,454,30000040,200000000000000000, 
> 121.5,4.99,2.44,RE3423ee,asfrewerfg,6RTETRWT,2012-01-02 
> 23:04:05.123456829,2012-01-20
> TRUE,11,4530,3000040,200000000000000000, 
> 121.5,4.99,2.44,SE3423ee,asfdsffder,TRTETRWT,2012-01-03 
> 05:04:05.123456839,2012-01-20
> TRUE,14,4590,3000400,200000000000000000, 
> 121.5,4.99,2.44,ASD423ee,asfertfdfg,HRTETRWT,2012-01-04 
> 05:06:05.123456849,2012-01-20
> FALSE,41,4250,00000,200000000000000000, 
> 121.5,4.99,2.44,SAD423ee,asrtsffdfg,HRTETRWT,2012-01-05 
> 05:07:05.123456859,2012-01-20
> TRUE,13,4510,30400,200000000000000000, 
> 121.5,4.99,2.44,DE3423ee,asfrtffdfg,YHTETRWT,2012-01-06 
> 06:08:05.123456869,2012-01-20
> ```
> # Analyze
> In the error message, it says the dictmeta doesnot exist.
> Actually this file is generated during the first load operation in 
> SESSION1,And the tablemeta is cached in SESSION2 when doing update operation 
> in SESSION2. After DELETE-LOAD operation in SESSION1, old dictionary files 
> has been deleted and new dictionary files are generated in SESSION1. But in 
> SESSION2, when doing update operation, we still use the outdated tablemeta 
> from cache which refers to the dictmeta that were outdated, thus causing the 
> error.
> To solve this problem, we should refresh the cache for tableMeta when the 
> corresponding data schema has been updated.
> # Solution
> Refresh the tablemeta cache when table schema has been changed.
> Since HiveSessionState.lookupRelation is slow(especially in concurrent query 
> scenario), dont call this method when table schema has not been changed.
> # Notes
> I've tested the scenario in my environment and it is OK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to