[ 
https://issues.apache.org/jira/browse/KYLIN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670403#comment-16670403
 ] 

Hubert STEFANI commented on KYLIN-3644:
---------------------------------------

obviously it is not sufficent. Il will do some further  investigation as we
have anither issue with Spark on null values when computing Fact Distinct :

     diagnostics: User class threw exception: java.lang.RuntimeException:
error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause:
Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most
recent failure: Lost task 2.3 in stage 0.0 (TID 17,
ip-10-0-76-221.eu-west-1.compute.internal, executor 34):
java.lang.NumberFormatException: For input string: "\N"
    at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Long.parseLong(Long.java:589)
    at java.lang.Long.parseLong(Long.java:631)
    at
org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:55)
    at
org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)
    at
org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)
    at
org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)
    at
org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)

Le mer. 31 oct. 2018 à 14:27, Shaofeng SHI (JIRA) <[email protected]> a



-- 

<http://www.novagen.tech/>
*Hubert STEFANI*
*B* : +33 1 76 21 55 40 | +33 3 59 56 16 30 | *P* : + 33 6 20 75 43 68 | *M*
: [email protected]
Paris La Défense, Les Collines de l’Arche, Immeuble Opéra E
Lille Flandres , 14 rue du Vieux Faubourg
[image: FACEBOOK] <https://www.facebook.com/novagenconseil/> [image:
LINKEDIN] <https://www.linkedin.com/company/novagen-conseil/> [image:
YOUTUBE] <https://www.youtube.com/channel/UC9FUgNCkoiddPmqLvea_apQ> [image:
TWITTER] <https://twitter.com/NovagenConseil> [image: INSTAGRAM]
<https://www.instagram.com/novagen_conseil/?hl=fr> This email may be
confidential or privileged. If you received this communication by mistake,
please don't forward it to anyone else, please erase all copies and
attachments, and please let me know that it went to the wrong person. Thank
you.



<https://www.linkedin.com/company/10980670/>


> NumberFormatExcetion on null values when building cube with Spark
> -----------------------------------------------------------------
>
>                 Key: KYLIN-3644
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3644
>             Project: Kylin
>          Issue Type: Bug
>          Components: Spark Engine
>    Affects Versions: v2.5.0
>            Reporter: Hubert STEFANI
>            Priority: Major
>             Fix For: v2.6.0
>
>         Attachments: 00_zeppelin_notebook.jpg, 01_overview_table.jpg, 
> 02_dimension_cube.jpg, 03_measure_cube.jpg, sortieData.csv
>
>
> We encounter an error any time we try to build a cube with the following 
> steps :
>  * upload a csv on AWS S3 with following characteristics : the column on 
> which the measure will be defined has some null values (Cf. attachment)
>  * create a hive table with spark
>  * create a model on  top of this table,
>  * create a cube with a SUM measure
>  * chose Spark as Engine
>  * Launch build
> Result : The build process fails at '{color:#4383b4}#7 Step Name: 
> {color}Build Cube with Spark' with the following error :
>  
> """"""
> 18/10/23 09:25:39 INFO scheduler.DAGScheduler: Job 0 failed: 
> saveAsNewAPIHadoopDataset at SparkCubingByLayer.java:253, took 7,277136 s
> Exception in thread "main" java.lang.RuntimeException: error execute 
> org.apache.kylin.engine.spark.SparkCubingByLayer. Root cause: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: 
> Lost task 0.3 in stage 0.0 (TID 4, 
> ip-172-31-35-113.eu-west-1.compute.internal, executor 4): 
> java.lang.NumberFormatException: For input string: "\N"
>     at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
>     at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
>     at java.lang.Double.parseDouble(Double.java:538)
>     at 
> org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:38)
>     at 
> org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:28)
>     at 
> org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueOf(BaseCuboidBuilder.java:162)
>     at 
> org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueObjects(BaseCuboidBuilder.java:127)
>     at 
> org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:297)
>     at 
> org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:257)
>     at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>     at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
> """""
> Note 1: the build  process is OK when run with Map/Reduce Engine.
> Note 2: the error doesn't seem to be related to AWS environment.
>  
> Sample of csv :
> ID;CATEGORIE;TEL;MONTANT;MAGASIN;MATRICULE;VILLE;
> 970;161;6-98-6-6-42;838.47034;Magasin_19;Client_Matricule_28;MARSEILLE;
> 971;89;62-15-2-64-86;;;Client_Matricule_1;LYON;
> 972;87;17-64-97-74-42;;;Client_Matricule_105;ORBEC;
> 973;174;79-33-90-0-55;;Magasin_7;Client_Matricule_55;AJACCIO;
> 974;172;89-95-71-6-49;141.64174;Magasin_9;Client_Matricule_105;BASTIA;
> 975;83;7-27-95-28-7;897.28204;;Client_Matricule_199;AJACCIO;
> 976;170;67-72-18-29-34;164.07967;Magasin_3;Client_Matricule_137;LILLE;
> 977;130;14-69-4-23-27;1928.9557;Magasin_1;Client_Matricule_17;NOMNOM;
> 978;43;55-91-84-98-49;891.2691;Magasin_0;Client_Matricule_22;NOMNOM;
> 979;117;98-96-0-54-39;1636.3994;Magasin_9;Client_Matricule_142;MARSEILLE;
> 980;163;37-55-76-53-38;;;Client_Matricule_64;NEWYORK;
> 981;106;32-40-6-46-15;;Magasin_2;Client_Matricule_158;NOMNOM;
> 982;56;95-60-83-89-90;;;Client_Matricule_102;NOMNOM;
> 983;168;21-56-62-0-58;;;Client_Matricule_160;NOMNOM;
> 984;154;92-67-37-94-60;;;Client_Matricule_137;PARIS;
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to