Yerui, You're correct, serialize the dictionary isn't a good idea; I will try to initialize these big objects inner executors, instead of transfering them from driver; I will get back to you if have problem. Thanks!
2017-01-25 15:56 GMT+08:00 Yerui Sun <[email protected]>: > Hi,shaofeng, > Sorry for my slow response. > There’s indeed some serialization issue in spark context. Here’s some > my opinions: > * Some field is initialized in constructor, which meaning NOT NEED to > be serialized, we can qualified these field with ‘transient’; > * Do we really need to serialized CachedTreeMap to spark executor > tasks? Maybe every tasks initialize own CacheTreeMap instance is another > choice; > > Please feel free to change the code if you really need to serialize > CachedTreeMap, and let me know if there’s somewhere I could help. > > > 在 2017年1月20日,14:04,ShaoFeng Shi <[email protected]> 写道: > > Hi Yerui, > > I noticed that the CachedTreeMap.java uses a couple of classes > from org.apache.hadoop.fs package; and you have a comment "TODO Depends > on HDFS for now, ideally just depends on storage interface" > > Now this impact the cube building with Spark, as some classes like > org.apache.hadoop.fs.Path isn't serializable while Spark relies on Java > serialization heavily. Will get error when building a cube with bitmap > measure as in below. So, can it be changed to ordinary classes like String > here? Thanks! > > Caused by: java.io.NotSerializableException: org.apache.hadoop.fs.Path > Serialization stack: > - object not serializable (class: org.apache.hadoop.fs.Path, value: > hdfs:/kylin/kylin_default_instance/resources/GlobalDict/ > dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP) > - writeObject data (class: java.util.TreeMap) > - object (class org.apache.kylin.dict.CachedTreeMap, {=null}) > - field (class: org.apache.kylin.dict.AppendTrieDictionary, name: > dictSliceMap, type: class java.util.TreeMap) > - object (class org.apache.kylin.dict.AppendTrieDictionary, > AppendTrieDictionary(hdfs:///kylin/kylin_default_instance/ > resources/GlobalDict/dict/DEFAULT.TEST_KYLIN_FACT/TEST_ > COUNT_DISTINCT_BITMAP/)) > - writeObject data (class: java.util.HashMap) > - object (class java.util.HashMap, {DEFAULT.TEST_KYLIN_FACT.LSTG_ > SITE_ID=org.apache.kylin.dict.TrieDictionaryForest@f30773fa, > DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL2_NAME=org.apache.kylin.dict. > TrieDictionaryForest@18259639, DEFAULT.TEST_CATEGORY_ > GROUPINGS.META_CATEG_NAME=org.apache.kylin.dict. > TrieDictionaryForest@44184626, BUYER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict. > TrieDictionaryForest@879f6439, SELLER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict. > TrieDictionaryForest@879f6439, BUYER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict. > TrieDictionaryForest@879f6439, SELLER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict. > TrieDictionaryForest@879f6439, DEFAULT.TEST_KYLIN_FACT.TRANS_ > ID=org.apache.kylin.dict.TrieDictionaryForest@93b5aa11, > DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL3_NAME=org.apache.kylin.dict. > TrieDictionaryForest@a494947b, SELLER_COUNTRY:DEFAULT.TEST_COUNTRY.NAME > <http://default.test_country.name/>=org.apache.kylin. > dict.TrieDictionaryForest@b3559b4c, BUYER_COUNTRY:DEFAULT.TEST_ > COUNTRY.NAME <http://default.test_country.name/>=org.apache.kylin. > dict.TrieDictionaryForest@b3559b4c, SELLER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict. > TrieDictionaryForest@410216c0, BUYER_ACCOUNT:DEFAULT.TEST_ > ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict. > TrieDictionaryForest@410216c0, DEFAULT.TEST_KYLIN_FACT.PRICE= > org.apache.kylin.dict.TrieDictionaryForest@89f144c6, > DEFAULT.TEST_KYLIN_FACT.TEST_COUNT_DISTINCT_BITMAP=AppendTrieDictionary( > hdfs:///kylin/kylin_default_instance/resources/GlobalDict/dict/ > DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP/), > DEFAULT.TEST_KYLIN_FACT.LEAF_CATEG_ID=org.apache.kylin. > dict.TrieDictionaryForest@25e701d0, DEFAULT.TEST_KYLIN_FACT.SLR_ > SEGMENT_CD=org.apache.kylin.dict.TrieDictionaryForest@dcfc7d11, > DEFAULT.TEST_KYLIN_FACT.CAL_DT=DateStrDictionary [pattern=yyyy-MM-dd, > baseId=0]}) > > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > > -- Best regards, Shaofeng Shi 史少锋
