Barry Becker created SPARK-20226:
------------------------------------

             Summary: Call to sqlContext.cacheTable takes an incredibly long 
time in some cases
                 Key: SPARK-20226
                 URL: https://issues.apache.org/jira/browse/SPARK-20226
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.1.0
         Environment: linux or windows
            Reporter: Barry Becker


I have a case where the call to sqlContext.cacheTable can take an arbitrarily 
long time depending on the number of columns that are referenced in a 
withColumn expression applied to a dataframe.

The dataset is small (20 columns 7861 rows). The sequence to reproduce is the 
following:
1) add a new column based on a function of 8 - 14 other colummns. 
   -- If I add 8 columns, then the call to cacheTable is fast - like *5 seconds*
   -- If I add 11 columns, then it is slow - like *60 seconds*
   -- and if I add 14 columns, then it basically *takes forever* - I gave up 
after 10 minutes or so.
        The Column expression that is added, is basically just concatenating 
the columns together in a single string. If a number is concatenated on a 
string (or vice versa) the number is first converted to a string.
      The expression looks something like this:
{code}
                  `Plate` + `State` + `License Type` + `Summons Number` + 
`Issue Date` + `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine 
Amount` + `Penalty Amount` + `Interest Amount`
{code}
          which we then convert to a Column expression that looks like this:
{code}
              UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License 
Type), UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation), 
UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)), 
UDF('Interest Amount))
{code}
         where the UDFs are very simple functions that basically call toString 
and + as needed.

2) apply a pipeline that includes some transformers that was saved earlier. 
Here are the steps of the pipeline (extracted from parquet)
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License
 Type_CLEANED__","handleInvalid":"skip","outputCol":"License Type_IDX__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing
 Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation
 Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
    "uid":"bucketizer_6f65ca9fa813",
        "paramMap":{
          "outputCol":"Summons 
Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons
 Number_CLEANED__"
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
    "uid":"bucketizer_f5db4fb8120e",
    "paramMap":{
           
"splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
            "handleInvalid":"keep","outputCol":"Issue 
Date_BINNED__","inputCol":"Issue Date_CLEANED__"
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
    "uid":"bucketizer_74568a2a5cfd",
        "paramMap":{
          "handleInvalid":"keep","outputCol":"Fine 
Amount_BINNED__","inputCol":"Fine 
Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
         }
        }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
    "uid":"bucketizer_109705dfdbcd",
        
"paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest 
Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest Amount_CLEANED__"}
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
    "uid":"bucketizer_2b2e3d8a324f",
        "paramMap":{
           "handleInvalid":"keep","inputCol":"Reduction 
Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__",
           "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
     "uid":"bucketizer_4d44c2ebf489",
     "paramMap":{
       
"splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
           "outputCol":"Payment Amount_BINNED__","inputCol":"Payment 
Amount_CLEANED__"
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
    "uid":"bucketizer_05a75eeef997",
        "paramMap":{
           "handleInvalid":"keep",
           
"splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
           "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
    "uid":"bucketizer_64b3ef2f97cf",
        
"paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
    "uid":"vecAssembler_932758a8f18e",
        "paramMap":{
          "outputCol":"_features_column__",
          "inputCols":["State_IDX__","License 
Type_IDX__","Violation_IDX__","County_IDX__","Issuing Agency_IDX__","Violation 
Status_IDX__","Summons Number_BINNED__","Issue Date_BINNED__","Fine 
Amount_BINNED__","Interest Amount_BINNED__","Reduction 
Amount_BINNED__","Payment Amount_BINNED__","Amount 
Due_BINNED__","Precinct_BINNED__"]
        }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
    "uid":"nb_e4b24f3c08b0",
        "paramMap":{
          "probabilityCol":"_class_probability_column__",
          "labelCol":"Penalty Amount_BINNED__",
          "predictionCol":"_prediction_column_",
          "modelType":"multinomial",
          "featuresCol":"_features_column__",
          "rawPredictionCol":"rawPrediction",
          "smoothing":3.518236190922951E-4
         }
   }{code}
 - 
{code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
    "uid":"sql_1ea4c1b5c52e",
        "paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS 
`_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
   }{code}

   3) Call cacheTable on sqlContext. The actual code used is:
   {code}
    val key = "foo"
    if (sqlContext.tableNames.contains(key))
      sqlContext.dropTempTable(key)
    df.createOrReplaceTempView(key)
    sqlContext.cacheTable(key)        <-- this takes a very long time
{code}

When I step through cacheTable in the debugger (in CacheManager.cacheQuery), I 
see that the query "planToCache" is very large (see below). 
I don't know much about query plans. Is this sort of giant nested query plan 
expected in this case? Is it in any way typical? Does it explain why it takes a 
very long time to cache? Why would adding just a few more columns to the add 
column expression result in a plan that takes exponentially longer?
{code}
SubqueryAlias foo123, `foo123`
+- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue 
Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine 
Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134, 
Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing 
Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount 
(predicted)#2363]
   +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, 
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation 
Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, 
Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 33 more fields]
      +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
33 more fields]
         +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca, 
`sql_1ea4c1b5c52e_5640c7097aca`
            +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
32 more fields]
               +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
31 more fields]
                  +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
30 more fields]
                     +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
29 more fields]
                        +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
28 more fields]
                           +- Project [Plate#123, Plate_CLEANED__#162, 
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
27 more fields]
                              +- Project [Plate#123, Plate_CLEANED__#162, 
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
26 more fields]
                                 +- Project [Plate#123, Plate_CLEANED__#162, 
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
25 more fields]
                                    +- Project [Plate#123, Plate_CLEANED__#162, 
State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, 
Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250, 
Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134, ... 
24 more fields]
                                       +- Project [Plate#123, 
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License 
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue 
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
                                          +- Project [Plate#123, 
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License 
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue 
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
                                             +- Project [Plate#123, 
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License 
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue 
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
                                                +- Project [Plate#123, 
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License 
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue 
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
                                                   +- Filter UDF(Violation 
Status_CLEANED__#174)
                                                      +- Project [Plate#123, 
Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125, License 
Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126, Issue 
Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation 
Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry 
Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
                                                         +- Filter UDF(Issuing 
Agency_CLEANED__#173)
                                                            +- Project 
[Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License 
Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons 
Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, 
Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment 
Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
                                                               +- Filter 
UDF(County_CLEANED__#172)
                                                                  +- Project 
[Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License 
Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons 
Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, 
Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment 
Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine 
Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
                                                                     +- Filter 
UDF(Violation_CLEANED__#167)
                                                                        +- 
Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, 
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation 
Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, 
Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 16 more fields]
                                                                           +- 
Filter UDF(License Type_CLEANED__#164)
                                                                              
+- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, 
Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation 
Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, 
Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213, 
Interest Amount_CLEANED__#250, Interest Amount#133, Reduction 
Amount_CLEANED__#251, Reduction Amount#134, ... 15 more fields]
                                                                                
 +- Filter UDF(State_CLEANED__#163)
                                                                                
    +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, 
License Type#125, License Type_CLEANED__#164, CASE WHEN isnull(Summons 
Number#126) THEN NaN ELSE Summons Number#126 END AS Summons 
Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue 
Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166, 
Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty 
Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest Amount#133) 
THEN NaN ELSE Interest Amount#133 END AS Interest Amount_CLEANED__#250, 
Interest Amount#133, CASE WHEN isnull(Reduction Amount#134) THEN NaN ELSE 
Reduction Amount#134 END AS Reduction Amount_CLEANED__#251, Reduction 
Amount#134, ... 14 more fields]
                                                                                
       +- Project [Plate#123, Plate_CLEANED__#162, State#124, 
State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons 
Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN NaN 
ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation 
Time#128, Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, 
Judgment Entry Date#130, CASE WHEN isnull(Judgment Entry Date_CLEANED__#168) 
THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END AS Judgment Entry 
Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine 
Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine 
Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty 
Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS Penalty 
Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134, Payment 
Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
                                                                                
          +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162, 
State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125, UDF(License 
Type#125) AS License Type_CLEANED__#164, Summons Number#126, Issue Date#127, 
cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165, Violation Time#128, 
UDF(Violation Time#128) AS Violation Time_CLEANED__#166, Violation#129, 
UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry Date#130, 
cast(Judgment Entry Date#130 as double) AS Judgment Entry Date_CLEANED__#168, 
Fine Amount#131, cast(Fine Amount#131 as double) AS Fine Amount_CLEANED__#169, 
Penalty Amount#132, cast(Penalty Amount#132 as double) AS Penalty 
Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134, Payment 
Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
                                                                                
             +- Project [Plate#6 AS Plate#123, State#7 AS State#124, License 
Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126, Issue 
Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128, 
Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry 
Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty 
Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS 
Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19 AS 
Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138, Issuing 
Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation Status#140, 
columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
                                                                                
                +- Project [Plate#6, State#7, License Type#8, Summons Number#9, 
Issue Date#10, Violation Time#11, Violation#12, Judgment Entry Date#13, Fine 
Amount#14, Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment 
Amount#18, Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation 
Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), 
License Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation 
Time#11), Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)), 
UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS 
columnBasedOnManyCols#43]
                                                                                
                   +- Relation[Plate#6,State#7,License Type#8,Summons 
Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry 
Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction 
Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing 
Agency#22,Violation Status#23] csv
{code}  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to