[
https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959024#comment-15959024
]
Barry Becker edited comment on SPARK-20226 at 4/6/17 2:45 PM:
--------------------------------------------------------------
I set spark.sql.constraintPropagation.enabled to false in job-server local.conf
and tried again.
It did not help. It still took about 2 minutes. Oddly, setting it to true
seemed to make it worse.
I did find something that did work though. If I simply call cache() on the
dataframe after the add column (right after step 1 above)
then it runs very quickly. The time spent in cacheTable goes from 60 seconds to
0.5 seconds. I don't understand why though.
I thought calling cache would only help if there was branching, but the
pipeline is linear isn't it?
Here is what the query plan looks like in the call to cache the dataframe
before transforming with the pipeline.
{code}
Project [Plate#6, State#7, License Type#8, Summons Number#9, Issue Date#10,
Violation Time#11, Violation#12, Judgment Entry Date#13, Fine Amount#14,
Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment Amount#18,
Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation Status#23,
cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License
Type#8), Violation Time#11), Violation#12), UDF(Judgment Entry Date#13)),
UDF(Issue Date#10)), UDF(Summons Number#9)), UDF(Fine Amount#14)), UDF(Penalty
Amount#15)), UDF(Interest Amount#16)) as string) AS columnBasedOnManyCols#43]
+- Relation[Plate#6,State#7,License Type#8,Summons Number#9,Issue
Date#10,Violation Time#11,Violation#12,Judgment Entry Date#13,Fine
Amount#14,Penalty Amount#15,Interest Amount#16,Reduction Amount#17,Payment
Amount#18,Amount Due#19,Precinct#20,County#21,Issuing Agency#22,Violation
Status#23] csv
{code}
Here is how the query plan now looks in the call to cacheTable after
transforming with the pipeline. Looks fairly similar to what it was before, but
now its fast.
{code}
SubqueryAlias foo123, `foo123`
+- Project [Plate#236, State#237, License Type#238, Summons Number#239, Issue
Date#240, Violation Time#241, Violation#242, Judgment Entry Date#243, Fine
Amount#244, Penalty Amount#245, Interest Amount#246, Reduction Amount#247,
Payment Amount#248, Amount Due#249, Precinct#250, County#251, Issuing
Agency#252, Violation Status#253, columnBasedOnManyCols#254, Penalty Amount
(predicted)#2476]
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 33 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
33 more fields]
+- SubqueryAlias sql_1ea4c1b5c52e_cd062499a688,
`sql_1ea4c1b5c52e_cd062499a688`
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
32 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
31 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
30 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
29 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
28 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
27 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
26 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
25 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
24 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 23 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 22 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 21 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 20 more fields]
+- Filter UDF(Violation
Status_CLEANED__#287)
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 19 more fields]
+- Filter UDF(Issuing
Agency_CLEANED__#286)
+- Project
[Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276, License
Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons
Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation Time#241,
Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment
Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 18 more fields]
+- Filter
UDF(County_CLEANED__#285)
+- Project
[Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276, License
Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons
Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation Time#241,
Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment
Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 17 more fields]
+- Filter
UDF(Violation_CLEANED__#280)
+-
Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 16 more fields]
+-
Filter UDF(License Type_CLEANED__#277)
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 15 more fields]
+- Filter UDF(State_CLEANED__#276)
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, CASE WHEN isnull(Summons
Number#239) THEN NaN ELSE Summons Number#239 END AS Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, CASE WHEN isnull(Interest Amount#246)
THEN NaN ELSE Interest Amount#246 END AS Interest Amount_CLEANED__#363,
Interest Amount#246, CASE WHEN isnull(Reduction Amount#247) THEN NaN ELSE
Reduction Amount#247 END AS Reduction Amount_CLEANED__#364, Reduction
Amount#247, ... 14 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number#239, Issue Date#240, CASE WHEN isnull(Issue Date_CLEANED__#278) THEN NaN
ELSE Issue Date_CLEANED__#278 END AS Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, CASE WHEN isnull(Judgment Entry Date_CLEANED__#281)
THEN NaN ELSE Judgment Entry Date_CLEANED__#281 END AS Judgment Entry
Date_CLEANED__#324, Fine Amount#244, CASE WHEN isnull(Fine
Amount_CLEANED__#282) THEN NaN ELSE Fine Amount_CLEANED__#282 END AS Fine
Amount_CLEANED__#325, Penalty Amount#245, CASE WHEN isnull(Penalty
Amount_CLEANED__#283) THEN NaN ELSE Penalty Amount_CLEANED__#283 END AS Penalty
Amount_CLEANED__#326, Interest Amount#246, Reduction Amount#247, Payment
Amount#248, Amount Due#249, Precinct#250, ... 9 more fields]
+- Project [Plate#236, UDF(Plate#236) AS Plate_CLEANED__#275,
State#237, UDF(State#237) AS State_CLEANED__#276, License Type#238, UDF(License
Type#238) AS License Type_CLEANED__#277, Summons Number#239, Issue Date#240,
cast(Issue Date#240 as double) AS Issue Date_CLEANED__#278, Violation Time#241,
UDF(Violation Time#241) AS Violation Time_CLEANED__#279, Violation#242,
UDF(Violation#242) AS Violation_CLEANED__#280, Judgment Entry Date#243,
cast(Judgment Entry Date#243 as double) AS Judgment Entry Date_CLEANED__#281,
Fine Amount#244, cast(Fine Amount#244 as double) AS Fine Amount_CLEANED__#282,
Penalty Amount#245, cast(Penalty Amount#245 as double) AS Penalty
Amount_CLEANED__#283, Interest Amount#246, Reduction Amount#247, Payment
Amount#248, Amount Due#249, Precinct#250, ... 9 more fields]
+- Project [Plate#6 AS Plate#236, State#7 AS State#237, License
Type#8 AS License Type#238, Summons Number#9 AS Summons Number#239, Issue
Date#10 AS Issue Date#240, Violation Time#11 AS Violation Time#241,
Violation#12 AS Violation#242, Judgment Entry Date#13 AS Judgment Entry
Date#243, Fine Amount#14 AS Fine Amount#244, Penalty Amount#15 AS Penalty
Amount#245, Interest Amount#16 AS Interest Amount#246, Reduction Amount#17 AS
Reduction Amount#247, Payment Amount#18 AS Payment Amount#248, Amount Due#19 AS
Amount Due#249, Precinct#20 AS Precinct#250, County#21 AS County#251, Issuing
Agency#22 AS Issuing Agency#252, Violation Status#23 AS Violation Status#253,
columnBasedOnManyCols#43 AS columnBasedOnManyCols#254]
+- Project [Plate#6, State#7, License Type#8, Summons Number#9,
Issue Date#10, Violation Time#11, Violation#12, Judgment Entry Date#13, Fine
Amount#14, Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment
Amount#18, Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation
Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7),
License Type#8), Violation Time#11), Violation#12), UDF(Judgment Entry
Date#13)), UDF(Issue Date#10)), UDF(Summons Number#9)), UDF(Fine Amount#14)),
UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS
columnBasedOnManyCols#43]
+- Relation[Plate#6,State#7,License Type#8,Summons
Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry
Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction
Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing
Agency#22,Violation Status#23] csv
{code}
Maybe this could be marked resolved. I'm not sure if there is something wrong
here or just my lack of understanding about how spark caching works.
was (Author: barrybecker4):
I set spark.sql.constraintPropagation.enabled to false in job-server local.conf
and tried again.
It did not help. It still took about 2 minutes. Oddly, setting it to true
seemed to make it worse.
I did find something that did work though. If I simply call cache() on the
dataframe after the add column (right after step 1 above)
then it runs very quickly. The time spent in cacheTable goes from 60 seconds to
0.5 seconds. I don't understand why though.
I thought calling cache would only help of there was branching, but the
pipeline is linear isn't it?
Here is what the query plan looks like in the call to cache the dataframe
before transforming with the pipeline.
{code}
Project [Plate#6, State#7, License Type#8, Summons Number#9, Issue Date#10,
Violation Time#11, Violation#12, Judgment Entry Date#13, Fine Amount#14,
Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment Amount#18,
Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation Status#23,
cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License
Type#8), Violation Time#11), Violation#12), UDF(Judgment Entry Date#13)),
UDF(Issue Date#10)), UDF(Summons Number#9)), UDF(Fine Amount#14)), UDF(Penalty
Amount#15)), UDF(Interest Amount#16)) as string) AS columnBasedOnManyCols#43]
+- Relation[Plate#6,State#7,License Type#8,Summons Number#9,Issue
Date#10,Violation Time#11,Violation#12,Judgment Entry Date#13,Fine
Amount#14,Penalty Amount#15,Interest Amount#16,Reduction Amount#17,Payment
Amount#18,Amount Due#19,Precinct#20,County#21,Issuing Agency#22,Violation
Status#23] csv
{code}
Here is how the query plan now looks in the call to cacheTable after
transforming with the pipeline. Looks fairly similar to what it was before, but
now its fast.
{code}
SubqueryAlias foo123, `foo123`
+- Project [Plate#236, State#237, License Type#238, Summons Number#239, Issue
Date#240, Violation Time#241, Violation#242, Judgment Entry Date#243, Fine
Amount#244, Penalty Amount#245, Interest Amount#246, Reduction Amount#247,
Payment Amount#248, Amount Due#249, Precinct#250, County#251, Issuing
Agency#252, Violation Status#253, columnBasedOnManyCols#254, Penalty Amount
(predicted)#2476]
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 33 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
33 more fields]
+- SubqueryAlias sql_1ea4c1b5c52e_cd062499a688,
`sql_1ea4c1b5c52e_cd062499a688`
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
32 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
31 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
30 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
29 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
28 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
27 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
26 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
25 more fields]
+- Project [Plate#236, Plate_CLEANED__#275,
State#237, State_CLEANED__#276, License Type#238, License Type_CLEANED__#277,
Summons Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, Interest Amount_CLEANED__#363,
Interest Amount#246, Reduction Amount_CLEANED__#364, Reduction Amount#247, ...
24 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 23 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 22 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 21 more fields]
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 20 more fields]
+- Filter UDF(Violation
Status_CLEANED__#287)
+- Project [Plate#236,
Plate_CLEANED__#275, State#237, State_CLEANED__#276, License Type#238, License
Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons Number#239, Issue
Date#240, Issue Date_CLEANED__#323, Violation Time#241, Violation
Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment Entry
Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 19 more fields]
+- Filter UDF(Issuing
Agency_CLEANED__#286)
+- Project
[Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276, License
Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons
Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation Time#241,
Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment
Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 18 more fields]
+- Filter
UDF(County_CLEANED__#285)
+- Project
[Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276, License
Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362, Summons
Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation Time#241,
Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280, Judgment
Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244, Fine
Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 17 more fields]
+- Filter
UDF(Violation_CLEANED__#280)
+-
Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 16 more fields]
+-
Filter UDF(License Type_CLEANED__#277)
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, Summons Number_CLEANED__#362,
Summons Number#239, Issue Date#240, Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, Judgment Entry Date_CLEANED__#324, Fine Amount#244,
Fine Amount_CLEANED__#325, Penalty Amount#245, Penalty Amount_CLEANED__#326,
Interest Amount_CLEANED__#363, Interest Amount#246, Reduction
Amount_CLEANED__#364, Reduction Amount#247, ... 15 more fields]
+- Filter UDF(State_CLEANED__#276)
+- Project [Plate#236, Plate_CLEANED__#275, State#237, State_CLEANED__#276,
License Type#238, License Type_CLEANED__#277, CASE WHEN isnull(Summons
Number#239) THEN NaN ELSE Summons Number#239 END AS Summons
Number_CLEANED__#362, Summons Number#239, Issue Date#240, Issue
Date_CLEANED__#323, Violation Time#241, Violation Time_CLEANED__#279,
Violation#242, Violation_CLEANED__#280, Judgment Entry Date#243, Judgment Entry
Date_CLEANED__#324, Fine Amount#244, Fine Amount_CLEANED__#325, Penalty
Amount#245, Penalty Amount_CLEANED__#326, CASE WHEN isnull(Interest Amount#246)
THEN NaN ELSE Interest Amount#246 END AS Interest Amount_CLEANED__#363,
Interest Amount#246, CASE WHEN isnull(Reduction Amount#247) THEN NaN ELSE
Reduction Amount#247 END AS Reduction Amount_CLEANED__#364, Reduction
Amount#247, ... 14 more fields]
+- Project [Plate#236, Plate_CLEANED__#275, State#237,
State_CLEANED__#276, License Type#238, License Type_CLEANED__#277, Summons
Number#239, Issue Date#240, CASE WHEN isnull(Issue Date_CLEANED__#278) THEN NaN
ELSE Issue Date_CLEANED__#278 END AS Issue Date_CLEANED__#323, Violation
Time#241, Violation Time_CLEANED__#279, Violation#242, Violation_CLEANED__#280,
Judgment Entry Date#243, CASE WHEN isnull(Judgment Entry Date_CLEANED__#281)
THEN NaN ELSE Judgment Entry Date_CLEANED__#281 END AS Judgment Entry
Date_CLEANED__#324, Fine Amount#244, CASE WHEN isnull(Fine
Amount_CLEANED__#282) THEN NaN ELSE Fine Amount_CLEANED__#282 END AS Fine
Amount_CLEANED__#325, Penalty Amount#245, CASE WHEN isnull(Penalty
Amount_CLEANED__#283) THEN NaN ELSE Penalty Amount_CLEANED__#283 END AS Penalty
Amount_CLEANED__#326, Interest Amount#246, Reduction Amount#247, Payment
Amount#248, Amount Due#249, Precinct#250, ... 9 more fields]
+- Project [Plate#236, UDF(Plate#236) AS Plate_CLEANED__#275,
State#237, UDF(State#237) AS State_CLEANED__#276, License Type#238, UDF(License
Type#238) AS License Type_CLEANED__#277, Summons Number#239, Issue Date#240,
cast(Issue Date#240 as double) AS Issue Date_CLEANED__#278, Violation Time#241,
UDF(Violation Time#241) AS Violation Time_CLEANED__#279, Violation#242,
UDF(Violation#242) AS Violation_CLEANED__#280, Judgment Entry Date#243,
cast(Judgment Entry Date#243 as double) AS Judgment Entry Date_CLEANED__#281,
Fine Amount#244, cast(Fine Amount#244 as double) AS Fine Amount_CLEANED__#282,
Penalty Amount#245, cast(Penalty Amount#245 as double) AS Penalty
Amount_CLEANED__#283, Interest Amount#246, Reduction Amount#247, Payment
Amount#248, Amount Due#249, Precinct#250, ... 9 more fields]
+- Project [Plate#6 AS Plate#236, State#7 AS State#237, License
Type#8 AS License Type#238, Summons Number#9 AS Summons Number#239, Issue
Date#10 AS Issue Date#240, Violation Time#11 AS Violation Time#241,
Violation#12 AS Violation#242, Judgment Entry Date#13 AS Judgment Entry
Date#243, Fine Amount#14 AS Fine Amount#244, Penalty Amount#15 AS Penalty
Amount#245, Interest Amount#16 AS Interest Amount#246, Reduction Amount#17 AS
Reduction Amount#247, Payment Amount#18 AS Payment Amount#248, Amount Due#19 AS
Amount Due#249, Precinct#20 AS Precinct#250, County#21 AS County#251, Issuing
Agency#22 AS Issuing Agency#252, Violation Status#23 AS Violation Status#253,
columnBasedOnManyCols#43 AS columnBasedOnManyCols#254]
+- Project [Plate#6, State#7, License Type#8, Summons Number#9,
Issue Date#10, Violation Time#11, Violation#12, Judgment Entry Date#13, Fine
Amount#14, Penalty Amount#15, Interest Amount#16, Reduction Amount#17, Payment
Amount#18, Amount Due#19, Precinct#20, County#21, Issuing Agency#22, Violation
Status#23, cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7),
License Type#8), Violation Time#11), Violation#12), UDF(Judgment Entry
Date#13)), UDF(Issue Date#10)), UDF(Summons Number#9)), UDF(Fine Amount#14)),
UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS
columnBasedOnManyCols#43]
+- Relation[Plate#6,State#7,License Type#8,Summons
Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry
Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction
Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing
Agency#22,Violation Status#23] csv
{code}
Maybe this could be marked resolved. I'm not sure if there is something wrong
here or just my lack of understanding about how spark caching works.
> Call to sqlContext.cacheTable takes an incredibly long time in some cases
> -------------------------------------------------------------------------
>
> Key: SPARK-20226
> URL: https://issues.apache.org/jira/browse/SPARK-20226
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Environment: linux or windows
> Reporter: Barry Becker
> Labels: cache
> Attachments: profile_indexer2.PNG, xyzzy.csv
>
>
> I have a case where the call to sqlContext.cacheTable can take an arbitrarily
> long time depending on the number of columns that are referenced in a
> withColumn expression applied to a dataframe.
> The dataset is small (20 columns 7861 rows). The sequence to reproduce is the
> following:
> 1) add a new column that references 8 - 14 of the columns in the dataset.
> - If I add 8 columns, then the call to cacheTable is fast - like *5
> seconds*
> - If I add 11 columns, then it is slow - like *60 seconds*
> - and if I add 14 columns, then it basically *takes forever* - I gave up
> after 10 minutes or so.
> The Column expression that is added, is basically just concatenating
> the columns together in a single string. If a number is concatenated on a
> string (or vice versa) the number is first converted to a string.
> The expression looks something like this:
> {code}
> `Plate` + `State` + `License Type` + `Summons Number` + `Issue Date` +
> `Violation Time` + `Violation` + `Judgment Entry Date` + `Fine Amount` +
> `Penalty Amount` + `Interest Amount`
> {code}
> which we then convert to a Column expression that looks like this:
> {code}
> UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF('Plate, 'State), 'License Type),
> UDF('Summons Number)), UDF('Issue Date)), 'Violation Time), 'Violation),
> UDF('Judgment Entry Date)), UDF('Fine Amount)), UDF('Penalty Amount)),
> UDF('Interest Amount))
> {code}
> where the UDFs are very simple functions that basically call toString
> and + as needed.
> 2) apply a pipeline that includes some transformers that was saved earlier.
> Here are the steps of the pipeline (extracted from parquet)
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200603,"sparkVersion":"2.1.0","uid":"strIdx_aeb04d2777cc","paramMap":{"handleInvalid":"skip","outputCol":"State_IDX__","inputCol":"State_CLEANED__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333200837,"sparkVersion":"2.1.0","uid":"strIdx_0164c4c13979","paramMap":{"inputCol":"License
> Type_CLEANED__","handleInvalid":"skip","outputCol":"License
> Type_IDX__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201068,"sparkVersion":"2.1.0","uid":"strIdx_25b6cbd02751","paramMap":{"inputCol":"Violation_CLEANED__","handleInvalid":"skip","outputCol":"Violation_IDX__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201282,"sparkVersion":"2.1.0","uid":"strIdx_aa12df0354d9","paramMap":{"handleInvalid":"skip","inputCol":"County_CLEANED__","outputCol":"County_IDX__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201552,"sparkVersion":"2.1.0","uid":"strIdx_babb120f3cc1","paramMap":{"handleInvalid":"skip","outputCol":"Issuing
> Agency_IDX__","inputCol":"Issuing Agency_CLEANED__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.StringIndexerModel","timestamp":1491333201759,"sparkVersion":"2.1.0","uid":"strIdx_5f2de9d9542d","paramMap":{"handleInvalid":"skip","outputCol":"Violation
> Status_IDX__","inputCol":"Violation Status_CLEANED__"}}{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333201987,"sparkVersion":"2.1.0",
> "uid":"bucketizer_6f65ca9fa813",
> "paramMap":{
> "outputCol":"Summons
> Number_BINNED__","handleInvalid":"keep","splits":["-Inf",1.386630656E9,3.696078592E9,4.005258752E9,6.045063168E9,8.136507392E9,"Inf"],"inputCol":"Summons
> Number_CLEANED__"
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202079,"sparkVersion":"2.1.0",
> "uid":"bucketizer_f5db4fb8120e",
> "paramMap":{
>
> "splits":["-Inf",1.435215616E9,1.443855616E9,1.447271936E9,1.448222464E9,1.448395264E9,1.448481536E9,1.448827136E9,1.449259264E9,1.449432064E9,1.449518336E9,"Inf"],
> "handleInvalid":"keep","outputCol":"Issue
> Date_BINNED__","inputCol":"Issue Date_CLEANED__"
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202172,"sparkVersion":"2.1.0",
> "uid":"bucketizer_74568a2a5cfd",
> "paramMap":{
> "handleInvalid":"keep","outputCol":"Fine
> Amount_BINNED__","inputCol":"Fine
> Amount_CLEANED__","splits":["-Inf",47.5,57.5,62.5,105.0,"Inf"]
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202269,"sparkVersion":"2.1.0",
> "uid":"bucketizer_109705dfdbcd",
>
> "paramMap":{"splits":["-Inf",0.004999999888241291,"Inf"],"outputCol":"Interest
> Amount_BINNED__","handleInvalid":"keep","inputCol":"Interest
> Amount_CLEANED__"}
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202362,"sparkVersion":"2.1.0",
> "uid":"bucketizer_2b2e3d8a324f",
> "paramMap":{
> "handleInvalid":"keep","inputCol":"Reduction
> Amount_CLEANED__","outputCol":"Reduction Amount_BINNED__",
> "splits":["-Inf",5.994999885559082,24.0,41.0,57.5,120.0,"Inf"]
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202485,"sparkVersion":"2.1.0",
> "uid":"bucketizer_4d44c2ebf489",
> "paramMap":{
>
> "splits":["-Inf",18.75,42.5,52.5,57.5,70.0050048828125,75.96499633789062,100.58499908447266,115.4949951171875,125.02000427246094,"Inf"],"handleInvalid":"keep",
> "outputCol":"Payment Amount_BINNED__","inputCol":"Payment
> Amount_CLEANED__"
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202587,"sparkVersion":"2.1.0",
> "uid":"bucketizer_05a75eeef997",
> "paramMap":{
> "handleInvalid":"keep",
>
> "splits":["-Inf",32.904998779296875,55.12000274658203,72.5,91.69999694824219,116.05500030517578,125.02999877929688,"Inf"],
> "outputCol":"Amount Due_BINNED__","inputCol":"Amount Due_CLEANED__"
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.Bucketizer","timestamp":1491333202678,"sparkVersion":"2.1.0",
> "uid":"bucketizer_64b3ef2f97cf",
>
> "paramMap":{"outputCol":"Precinct_BINNED__","handleInvalid":"keep","inputCol":"Precinct_CLEANED__","splits":["-Inf",0.5,23.5,"Inf"]}
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.VectorAssembler","timestamp":1491333202774,"sparkVersion":"2.1.0",
> "uid":"vecAssembler_932758a8f18e",
> "paramMap":{
> "outputCol":"_features_column__",
> "inputCols":["State_IDX__","License
> Type_IDX__","Violation_IDX__","County_IDX__","Issuing
> Agency_IDX__","Violation Status_IDX__","Summons Number_BINNED__","Issue
> Date_BINNED__","Fine Amount_BINNED__","Interest Amount_BINNED__","Reduction
> Amount_BINNED__","Payment Amount_BINNED__","Amount
> Due_BINNED__","Precinct_BINNED__"]
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.classification.NaiveBayesModel","timestamp":1491333202874,"sparkVersion":"2.1.0",
> "uid":"nb_e4b24f3c08b0",
> "paramMap":{
> "probabilityCol":"_class_probability_column__",
> "labelCol":"Penalty Amount_BINNED__",
> "predictionCol":"_prediction_column_",
> "modelType":"multinomial",
> "featuresCol":"_features_column__",
> "rawPredictionCol":"rawPrediction",
> "smoothing":3.518236190922951E-4
> }
> }{code}
> -
> {code}{"class":"org.apache.spark.ml.feature.SQLTransformer","timestamp":1491333203106,"sparkVersion":"2.1.0",
> "uid":"sql_1ea4c1b5c52e",
> "paramMap":{"statement":"SELECT *, CAST(_prediction_column_ AS INT) AS
> `_*_prediction_label_column_*__` FROM __THIS__ /*cutInfo:[10.0,25.0]*/"}
> }{code}
> 3) Call cacheTable on sqlContext. The actual code used is:
> {code}
> val key = "foo"
> if (sqlContext.tableNames.contains(key))
> sqlContext.dropTempTable(key)
> df.createOrReplaceTempView(key)
> sqlContext.cacheTable(key) <-- this takes a very long time
> {code}
> When I step through cacheTable in the debugger (in CacheManager.cacheQuery),
> I see that the query "planToCache" is very large (see below).
> I don't know much about query plans. Is this sort of giant nested query plan
> expected in this case? Is it in any way typical? Does it explain why it takes
> a very long time to cache? Why would adding just a few more columns to the
> add column expression result in a plan that takes exponentially longer?
> {code}
> SubqueryAlias foo123, `foo123`
> +- Project [Plate#123, State#124, License Type#125, Summons Number#126, Issue
> Date#127, Violation Time#128, Violation#129, Judgment Entry Date#130, Fine
> Amount#131, Penalty Amount#132, Interest Amount#133, Reduction Amount#134,
> Payment Amount#135, Amount Due#136, Precinct#137, County#138, Issuing
> Agency#139, Violation Status#140, columnBasedOnManyCols#141, Penalty Amount
> (predicted)#2363]
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 33 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 33 more fields]
> +- SubqueryAlias sql_1ea4c1b5c52e_5640c7097aca,
> `sql_1ea4c1b5c52e_5640c7097aca`
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 32 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 31 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 30 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 29 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162,
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 28 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162,
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 27 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162,
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 26 more fields]
> +- Project [Plate#123, Plate_CLEANED__#162,
> State#124, State_CLEANED__#163, License Type#125, License Type_CLEANED__#164,
> Summons Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 25 more fields]
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 24 more fields]
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 23 more fields]
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 22 more fields]
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 21 more fields]
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 20 more fields]
> +- Filter UDF(Violation
> Status_CLEANED__#174)
> +- Project [Plate#123,
> Plate_CLEANED__#162, State#124, State_CLEANED__#163, License Type#125,
> License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons Number#126,
> Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128, Violation
> Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167, Judgment Entry
> Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131, Fine
> Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 19 more fields]
> +- Filter
> UDF(Issuing Agency_CLEANED__#173)
> +- Project
> [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License
> Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons
> Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128,
> Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
> Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131,
> Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 18 more fields]
> +- Filter
> UDF(County_CLEANED__#172)
> +- Project
> [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163, License
> Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249, Summons
> Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation Time#128,
> Violation Time_CLEANED__#166, Violation#129, Violation_CLEANED__#167,
> Judgment Entry Date#130, Judgment Entry Date_CLEANED__#211, Fine Amount#131,
> Fine Amount_CLEANED__#212, Penalty Amount#132, Penalty Amount_CLEANED__#213,
> Interest Amount_CLEANED__#250, Interest Amount#133, Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 17 more fields]
> +-
> Filter UDF(Violation_CLEANED__#167)
> +-
> Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
> License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
> Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation
> Time#128, Violation Time_CLEANED__#166, Violation#129,
> Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry
> Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 16 more fields]
> +-
> Filter UDF(License Type_CLEANED__#164)
>
> +- Project [Plate#123, Plate_CLEANED__#162, State#124, State_CLEANED__#163,
> License Type#125, License Type_CLEANED__#164, Summons Number_CLEANED__#249,
> Summons Number#126, Issue Date#127, Issue Date_CLEANED__#210, Violation
> Time#128, Violation Time_CLEANED__#166, Violation#129,
> Violation_CLEANED__#167, Judgment Entry Date#130, Judgment Entry
> Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, Interest Amount_CLEANED__#250,
> Interest Amount#133, Reduction Amount_CLEANED__#251, Reduction Amount#134,
> ... 15 more fields]
>
> +- Filter UDF(State_CLEANED__#163)
>
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, CASE WHEN
> isnull(Summons Number#126) THEN NaN ELSE Summons Number#126 END AS Summons
> Number_CLEANED__#249, Summons Number#126, Issue Date#127, Issue
> Date_CLEANED__#210, Violation Time#128, Violation Time_CLEANED__#166,
> Violation#129, Violation_CLEANED__#167, Judgment Entry Date#130, Judgment
> Entry Date_CLEANED__#211, Fine Amount#131, Fine Amount_CLEANED__#212, Penalty
> Amount#132, Penalty Amount_CLEANED__#213, CASE WHEN isnull(Interest
> Amount#133) THEN NaN ELSE Interest Amount#133 END AS Interest
> Amount_CLEANED__#250, Interest Amount#133, CASE WHEN isnull(Reduction
> Amount#134) THEN NaN ELSE Reduction Amount#134 END AS Reduction
> Amount_CLEANED__#251, Reduction Amount#134, ... 14 more fields]
>
> +- Project [Plate#123, Plate_CLEANED__#162, State#124,
> State_CLEANED__#163, License Type#125, License Type_CLEANED__#164, Summons
> Number#126, Issue Date#127, CASE WHEN isnull(Issue Date_CLEANED__#165) THEN
> NaN ELSE Issue Date_CLEANED__#165 END AS Issue Date_CLEANED__#210, Violation
> Time#128, Violation Time_CLEANED__#166, Violation#129,
> Violation_CLEANED__#167, Judgment Entry Date#130, CASE WHEN isnull(Judgment
> Entry Date_CLEANED__#168) THEN NaN ELSE Judgment Entry Date_CLEANED__#168 END
> AS Judgment Entry Date_CLEANED__#211, Fine Amount#131, CASE WHEN isnull(Fine
> Amount_CLEANED__#169) THEN NaN ELSE Fine Amount_CLEANED__#169 END AS Fine
> Amount_CLEANED__#212, Penalty Amount#132, CASE WHEN isnull(Penalty
> Amount_CLEANED__#170) THEN NaN ELSE Penalty Amount_CLEANED__#170 END AS
> Penalty Amount_CLEANED__#213, Interest Amount#133, Reduction Amount#134,
> Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>
> +- Project [Plate#123, UDF(Plate#123) AS Plate_CLEANED__#162,
> State#124, UDF(State#124) AS State_CLEANED__#163, License Type#125,
> UDF(License Type#125) AS License Type_CLEANED__#164, Summons Number#126,
> Issue Date#127, cast(Issue Date#127 as double) AS Issue Date_CLEANED__#165,
> Violation Time#128, UDF(Violation Time#128) AS Violation Time_CLEANED__#166,
> Violation#129, UDF(Violation#129) AS Violation_CLEANED__#167, Judgment Entry
> Date#130, cast(Judgment Entry Date#130 as double) AS Judgment Entry
> Date_CLEANED__#168, Fine Amount#131, cast(Fine Amount#131 as double) AS Fine
> Amount_CLEANED__#169, Penalty Amount#132, cast(Penalty Amount#132 as double)
> AS Penalty Amount_CLEANED__#170, Interest Amount#133, Reduction Amount#134,
> Payment Amount#135, Amount Due#136, Precinct#137, ... 9 more fields]
>
> +- Project [Plate#6 AS Plate#123, State#7 AS State#124,
> License Type#8 AS License Type#125, Summons Number#9 AS Summons Number#126,
> Issue Date#10 AS Issue Date#127, Violation Time#11 AS Violation Time#128,
> Violation#12 AS Violation#129, Judgment Entry Date#13 AS Judgment Entry
> Date#130, Fine Amount#14 AS Fine Amount#131, Penalty Amount#15 AS Penalty
> Amount#132, Interest Amount#16 AS Interest Amount#133, Reduction Amount#17 AS
> Reduction Amount#134, Payment Amount#18 AS Payment Amount#135, Amount Due#19
> AS Amount Due#136, Precinct#20 AS Precinct#137, County#21 AS County#138,
> Issuing Agency#22 AS Issuing Agency#139, Violation Status#23 AS Violation
> Status#140, columnBasedOnManyCols#43 AS columnBasedOnManyCols#141]
>
> +- Project [Plate#6, State#7, License Type#8, Summons
> Number#9, Issue Date#10, Violation Time#11, Violation#12, Judgment Entry
> Date#13, Fine Amount#14, Penalty Amount#15, Interest Amount#16, Reduction
> Amount#17, Payment Amount#18, Amount Due#19, Precinct#20, County#21, Issuing
> Agency#22, Violation Status#23,
> cast(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(UDF(Plate#6, State#7), License
> Type#8), UDF(Summons Number#9)), UDF(Issue Date#10)), Violation Time#11),
> Violation#12), UDF(Judgment Entry Date#13)), UDF(Fine Amount#14)),
> UDF(Penalty Amount#15)), UDF(Interest Amount#16)) as string) AS
> columnBasedOnManyCols#43]
>
> +- Relation[Plate#6,State#7,License Type#8,Summons
> Number#9,Issue Date#10,Violation Time#11,Violation#12,Judgment Entry
> Date#13,Fine Amount#14,Penalty Amount#15,Interest Amount#16,Reduction
> Amount#17,Payment Amount#18,Amount Due#19,Precinct#20,County#21,Issuing
> Agency#22,Violation Status#23] csv
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]