Yerui,

You're correct, serialize the dictionary isn't a good idea; I will try to
initialize these big objects inner executors, instead of transfering them
from driver; I will get back to you if have problem. Thanks!

2017-01-25 15:56 GMT+08:00 Yerui Sun <[email protected]>:

> Hi,shaofeng,
>    Sorry for my slow response.
>    There’s indeed some serialization issue in spark context. Here’s some
> my opinions:
>    * Some field is initialized in constructor, which meaning NOT NEED to
> be serialized, we can qualified these field with ‘transient’;
>    * Do we really need to serialized CachedTreeMap to spark executor
> tasks? Maybe every tasks initialize own CacheTreeMap instance is another
> choice;
>
>    Please feel free to change the code if you really need to serialize
> CachedTreeMap, and let me know if there’s somewhere I could help.
>
>
> 在 2017年1月20日,14:04,ShaoFeng Shi <[email protected]> 写道:
>
> Hi Yerui,
>
> I noticed that the CachedTreeMap.java uses a couple of classes
> from org.apache.hadoop.fs package; and you have a comment "TODO Depends
> on HDFS for now, ideally just depends on storage interface"
>
> Now this impact the cube building with Spark, as some classes like
> org.apache.hadoop.fs.Path isn't serializable while Spark relies on Java
> serialization heavily. Will get error when building a cube with bitmap
> measure as in below. So, can it be changed to ordinary classes like String
> here? Thanks!
>
> Caused by: java.io.NotSerializableException: org.apache.hadoop.fs.Path
> Serialization stack:
> - object not serializable (class: org.apache.hadoop.fs.Path, value:
> hdfs:/kylin/kylin_default_instance/resources/GlobalDict/
> dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP)
> - writeObject data (class: java.util.TreeMap)
> - object (class org.apache.kylin.dict.CachedTreeMap, {=null})
> - field (class: org.apache.kylin.dict.AppendTrieDictionary, name:
> dictSliceMap, type: class java.util.TreeMap)
> - object (class org.apache.kylin.dict.AppendTrieDictionary,
> AppendTrieDictionary(hdfs:///kylin/kylin_default_instance/
> resources/GlobalDict/dict/DEFAULT.TEST_KYLIN_FACT/TEST_
> COUNT_DISTINCT_BITMAP/))
> - writeObject data (class: java.util.HashMap)
> - object (class java.util.HashMap, {DEFAULT.TEST_KYLIN_FACT.LSTG_
> SITE_ID=org.apache.kylin.dict.TrieDictionaryForest@f30773fa,
> DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL2_NAME=org.apache.kylin.dict.
> TrieDictionaryForest@18259639, DEFAULT.TEST_CATEGORY_
> GROUPINGS.META_CATEG_NAME=org.apache.kylin.dict.
> TrieDictionaryForest@44184626, BUYER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict.
> TrieDictionaryForest@879f6439, SELLER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict.
> TrieDictionaryForest@879f6439, BUYER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict.
> TrieDictionaryForest@879f6439, SELLER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict.
> TrieDictionaryForest@879f6439, DEFAULT.TEST_KYLIN_FACT.TRANS_
> ID=org.apache.kylin.dict.TrieDictionaryForest@93b5aa11,
> DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL3_NAME=org.apache.kylin.dict.
> TrieDictionaryForest@a494947b, SELLER_COUNTRY:DEFAULT.TEST_COUNTRY.NAME
> <http://default.test_country.name/>=org.apache.kylin.
> dict.TrieDictionaryForest@b3559b4c, BUYER_COUNTRY:DEFAULT.TEST_
> COUNTRY.NAME <http://default.test_country.name/>=org.apache.kylin.
> dict.TrieDictionaryForest@b3559b4c, SELLER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict.
> TrieDictionaryForest@410216c0, BUYER_ACCOUNT:DEFAULT.TEST_
> ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict.
> TrieDictionaryForest@410216c0, DEFAULT.TEST_KYLIN_FACT.PRICE=
> org.apache.kylin.dict.TrieDictionaryForest@89f144c6,
> DEFAULT.TEST_KYLIN_FACT.TEST_COUNT_DISTINCT_BITMAP=AppendTrieDictionary(
> hdfs:///kylin/kylin_default_instance/resources/GlobalDict/dict/
> DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP/),
> DEFAULT.TEST_KYLIN_FACT.LEAF_CATEG_ID=org.apache.kylin.
> dict.TrieDictionaryForest@25e701d0, DEFAULT.TEST_KYLIN_FACT.SLR_
> SEGMENT_CD=org.apache.kylin.dict.TrieDictionaryForest@dcfc7d11,
> DEFAULT.TEST_KYLIN_FACT.CAL_DT=DateStrDictionary [pattern=yyyy-MM-dd,
> baseId=0]})
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to