Evan Chan created SPARK-3297:
--------------------------------
Summary: [Spark SQL][UI] SchemaRDD toString with many columns
messes up Storage tab display
Key: SPARK-3297
URL: https://issues.apache.org/jira/browse/SPARK-3297
Project: Spark
Issue Type: Bug
Components: SQL, Web UI
Affects Versions: 1.0.2
Reporter: Evan Chan
Priority: Minor
When a SchemaRDD with many columns (for example, 57 columns in this example) is
cached using sqlContext.cacheTable, the Storage tab of the driver Web UI
display gets messed up, because the long string of the SchemaRDD causes the
first column to be much much wider than the others, and in fact much wider than
the width of the browser. It would be nice to have the first column be
restricted to, say, 50% of the width of the browser window, with some minimum.
For example this is the SchemaRDD text for my table:
RDD Storage Info for ExistingRdd
[ActionGeo_ADM1Code#198,ActionGeo_CountryCode#199,ActionGeo_FeatureID#200,ActionGeo_FullName#201,ActionGeo_Lat#202,ActionGeo_Long#203,ActionGeo_Type#204,Actor1Code#205,Actor1CountryCode#206,Actor1EthnicCode#207,Actor1Geo_ADM1Code#208,Actor1Geo_CountryCode#209,Actor1Geo_FeatureID#210,Actor1Geo_FullName#211,Actor1Geo_Lat#212,Actor1Geo_Long#213,Actor1Geo_Type#214,Actor1KnownGroupCode#215,Actor1Name#216,Actor1Religion1Code#217,Actor1Religion2Code#218,Actor1Type1Code#219,Actor1Type2Code#220,Actor1Type3Code#221,Actor2Code#222,Actor2CountryCode#223,Actor2EthnicCode#224,Actor2Geo_ADM1Code#225,Actor2Geo_CountryCode#226,Actor2Geo_FeatureID#227,Actor2Geo_FullName#228,Actor2Geo_Lat#229,Actor2Geo_Long#230,Actor2Geo_Type#231,Actor2KnownGroupCode#232,Actor2Name#233,Actor2Religion1Code#234,Actor2Religion2Code#235,Actor2Type1Code#236,Actor2Type2Code#237,Actor2Type3Code#238,AvgTone#239,DATEADDED#240,Day#241,EventBaseCode#242,EventCode#243,EventId#244,EventRootCode#245,FractionDate#246,GoldsteinScale#247,IsRootEvent#248,MonthYear#249,NumArticles#250,NumMentions#251,NumSources#252,QuadClass#253,Year#254],
MappedRDD[200]
I would personally love to fix the toString method to not necessarily print
every column, but to cut it off after a while. This would aid the printout in
the Spark Shell as well. For example:
[ActionGeo_ADM1Code#198,ActionGeo_CountryCode#199,ActionGeo_FeatureID#200,ActionGeo_FullName#201,ActionGeo_Lat#202
.... and 52 more columns]
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]