[jira] [Comment Edited] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174895#comment-15174895 ] zhichao-li edited comment on SPARK-13141 at 3/2/16 2:29 AM: Just try, but this cannot be reproduced from the master version by : create table mn.logs (field1 string, field2 string, field3 string) partitioned by (year string, month string , day string, host string) row format delimited fields terminated by ','; insert into logs partition (year="2013", month="07", day="28", host="host1") values ("foo","foo","foo") hc.table("logs").show() as you mentioned, not sure if it's specific to the version of CDH 5.5.1 was (Author: zhichao-li): Just try, but this cannot be reproduced from the master version by the sql: `create table mn.logs (field1 string, field2 string, field3 string) partitioned by (year string, month string , day string, host string) row format delimited fields terminated by ',';` as you mentioned, not sure if it's specific to the version of CDH 5.5.1 > Dataframe created from Hive partitioned tables using HiveContext returns > wrong results > -- > > Key: SPARK-13141 > URL: https://issues.apache.org/jira/browse/SPARK-13141 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.1 >Reporter: Simone >Priority: Critical > > I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 > in yarn-client mode. > The problem occurs with partitioned tables on text delimited HDFS data, both > with Scala and Python. > This an example code: > import org.apache.spark.sql.hive.HiveContext > val hc = new HiveContext(sc) > hc.table("my_db.partition_table").show() > The result is that all values of all rows are NULL, except from the first > column (that contains the whole line of data) and the partitioning columns, > which appears to be correct. > With Hive and Impala I get correct results. > Also with Spark on the same data with a not partitioned table I get correct > results. > I think that similar problems occurs also with Avro data: > https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174895#comment-15174895 ] zhichao-li commented on SPARK-13141: Just try, but this cannot be reproduced from the master version by the sql: `create table mn.logs (field1 string, field2 string, field3 string) partitioned by (year string, month string , day string, host string) row format delimited fields terminated by ',';` as you mentioned, not sure if it's specific to the version of CDH 5.5.1 > Dataframe created from Hive partitioned tables using HiveContext returns > wrong results > -- > > Key: SPARK-13141 > URL: https://issues.apache.org/jira/browse/SPARK-13141 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.1 >Reporter: Simone >Priority: Critical > > I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 > in yarn-client mode. > The problem occurs with partitioned tables on text delimited HDFS data, both > with Scala and Python. > This an example code: > import org.apache.spark.sql.hive.HiveContext > val hc = new HiveContext(sc) > hc.table("my_db.partition_table").show() > The result is that all values of all rows are NULL, except from the first > column (that contains the whole line of data) and the partitioning columns, > which appears to be correct. > With Hive and Impala I get correct results. > Also with Spark on the same data with a not partitioned table I get correct > results. > I think that similar problems occurs also with Avro data: > https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12820) Resolve column with full qualified names: db.table.column
zhichao-li created SPARK-12820: -- Summary: Resolve column with full qualified names: db.table.column Key: SPARK-12820 URL: https://issues.apache.org/jira/browse/SPARK-12820 Project: Spark Issue Type: New Feature Components: SQL Reporter: zhichao-li Priority: Minor Currently spark only support to specify col name like: `table.col`, or `col` in projection, but it's very common that user use `db.table.col` especially when join table across database. Hive doesn't support this for now but it has been used in lot of other traditional db like mysql. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12789) Support order by index
zhichao-li created SPARK-12789: -- Summary: Support order by index Key: SPARK-12789 URL: https://issues.apache.org/jira/browse/SPARK-12789 Project: Spark Issue Type: Bug Components: SQL Reporter: zhichao-li Priority: Minor Num in Order by is treated as constant expression at the moment. I guess it would be good to enable user to specify column by index which has been supported in Hive 0.11.0 and later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11517) Calc partitions in parallel for multiple partitions table
zhichao-li created SPARK-11517: -- Summary: Calc partitions in parallel for multiple partitions table Key: SPARK-11517 URL: https://issues.apache.org/jira/browse/SPARK-11517 Project: Spark Issue Type: Improvement Components: SQL Reporter: zhichao-li Priority: Minor Currently we calculate the getPartitions for each "hive partition" in sequence way, it would be faster if we can parallel this on driver side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11121) Incorrect TaskLocation type
zhichao-li created SPARK-11121: -- Summary: Incorrect TaskLocation type Key: SPARK-11121 URL: https://issues.apache.org/jira/browse/SPARK-11121 Project: Spark Issue Type: Bug Components: Spark Core Reporter: zhichao-li Priority: Minor "toString" is the only difference between HostTaskLocation and HDFSCacheTaskLocation for the moment, but it would be better to correct this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9626) Add python api for base64, crc32, pmod, factorial and conv functions
zhichao-li created SPARK-9626: - Summary: Add python api for base64, crc32, pmod, factorial and conv functions Key: SPARK-9626 URL: https://issues.apache.org/jira/browse/SPARK-9626 Project: Spark Issue Type: Improvement Components: PySpark Reporter: zhichao-li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9626) Add python api for base64, crc32, pmod, factorial and conv functions
[ https://issues.apache.org/jira/browse/SPARK-9626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li closed SPARK-9626. - Resolution: Duplicate duplicated with SPARK-9513 Add python api for base64, crc32, pmod, factorial and conv functions Key: SPARK-9626 URL: https://issues.apache.org/jira/browse/SPARK-9626 Project: Spark Issue Type: Improvement Components: PySpark Reporter: zhichao-li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9238) two extra useless entries for bytesOfCodePointInUTF8
zhichao-li created SPARK-9238: - Summary: two extra useless entries for bytesOfCodePointInUTF8 Key: SPARK-9238 URL: https://issues.apache.org/jira/browse/SPARK-9238 Project: Spark Issue Type: Bug Components: SQL Reporter: zhichao-li Priority: Trivial Only a trial thing, not sure if I understand correctly or not but I guess only 2 entries in bytesOfCodePointInUTF8 for the case of 6 bytes codepoint(110x) is enough. Details can be found from https://en.wikipedia.org/wiki/UTF-8 in Description section. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8227) math function: unhex
[ https://issues.apache.org/jira/browse/SPARK-8227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587694#comment-14587694 ] zhichao-li commented on SPARK-8227: --- typo. pls ignore this one. math function: unhex Key: SPARK-8227 URL: https://issues.apache.org/jira/browse/SPARK-8227 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: zhichao-li unhex(STRING a): BINARY Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8206) math function: round
[ https://issues.apache.org/jira/browse/SPARK-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579943#comment-14579943 ] zhichao-li commented on SPARK-8206: --- I will take this one math function: round Key: SPARK-8206 URL: https://issues.apache.org/jira/browse/SPARK-8206 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin round(double a): double Returns the rounded BIGINT value of a. round(double a, INT d): double Returns a rounded to d decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8220) math function: positive
[ https://issues.apache.org/jira/browse/SPARK-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579961#comment-14579961 ] zhichao-li commented on SPARK-8220: --- I will take this one math function: positive --- Key: SPARK-8220 URL: https://issues.apache.org/jira/browse/SPARK-8220 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin positive(INT a): INT positive(DOUBLE a): DOUBLE This is really just an identify function. We should create an Identity expression, and then in the optimizer just removes the Identity functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8221) math function: pmod
[ https://issues.apache.org/jira/browse/SPARK-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579962#comment-14579962 ] zhichao-li commented on SPARK-8221: --- I will take this one math function: pmod --- Key: SPARK-8221 URL: https://issues.apache.org/jira/browse/SPARK-8221 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin pmod(INT a, INT b): INT pmod(DOUBLE a, DOUBLE b): DOUBLE Returns the positive value of a mod b. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8219) math function: negative
[ https://issues.apache.org/jira/browse/SPARK-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579959#comment-14579959 ] zhichao-li commented on SPARK-8219: --- I will take this one math function: negative --- Key: SPARK-8219 URL: https://issues.apache.org/jira/browse/SPARK-8219 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin This is just an alias for UnaryMinus. Only add it to FunctionRegistry, and not DataFrame. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8222) math function: alias power / pow
[ https://issues.apache.org/jira/browse/SPARK-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579963#comment-14579963 ] zhichao-li commented on SPARK-8222: --- I will take this one math function: alias power / pow Key: SPARK-8222 URL: https://issues.apache.org/jira/browse/SPARK-8222 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Add to FunctionRegistry power. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8209) math function: conv
[ https://issues.apache.org/jira/browse/SPARK-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579945#comment-14579945 ] zhichao-li commented on SPARK-8209: --- I will take this one math function: conv --- Key: SPARK-8209 URL: https://issues.apache.org/jira/browse/SPARK-8209 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin conv(BIGINT num, INT from_base, INT to_base): string conv(STRING num, INT from_base, INT to_base): string Converts a number from a given base to another (see http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8208) math function: ceiling
[ https://issues.apache.org/jira/browse/SPARK-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579944#comment-14579944 ] zhichao-li commented on SPARK-8208: --- I will take this one math function: ceiling -- Key: SPARK-8208 URL: https://issues.apache.org/jira/browse/SPARK-8208 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin We already have ceil -- just need to create an alias for it in FunctionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8211) math function: radians
[ https://issues.apache.org/jira/browse/SPARK-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579947#comment-14579947 ] zhichao-li commented on SPARK-8211: --- I will take this one math function: radians -- Key: SPARK-8211 URL: https://issues.apache.org/jira/browse/SPARK-8211 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Alias toRadians - radians in FunctionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8213) math function: factorial
[ https://issues.apache.org/jira/browse/SPARK-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579948#comment-14579948 ] zhichao-li commented on SPARK-8213: --- I will take this one math function: factorial Key: SPARK-8213 URL: https://issues.apache.org/jira/browse/SPARK-8213 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin factorial(INT a): long Returns the factorial of a (as of Hive 1.2.0). Valid a is [0..20]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8214) math function: hex
[ https://issues.apache.org/jira/browse/SPARK-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579949#comment-14579949 ] zhichao-li commented on SPARK-8214: --- I will take this one math function: hex -- Key: SPARK-8214 URL: https://issues.apache.org/jira/browse/SPARK-8214 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin hex(BIGINT a): string hex(STRING a): string hex(BINARY a): string If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING. (See http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex, BINARY version as of Hive 0.12.0.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8210) math function: degrees
[ https://issues.apache.org/jira/browse/SPARK-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579946#comment-14579946 ] zhichao-li commented on SPARK-8210: --- I will take this one math function: degrees -- Key: SPARK-8210 URL: https://issues.apache.org/jira/browse/SPARK-8210 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Alias todegrees - degrees. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8216) math function: rename log - ln
[ https://issues.apache.org/jira/browse/SPARK-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579952#comment-14579952 ] zhichao-li commented on SPARK-8216: --- I will take this one math function: rename log - ln --- Key: SPARK-8216 URL: https://issues.apache.org/jira/browse/SPARK-8216 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Rename expression Log - Ln. Also create aliased data frame functions, and update FunctionRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8224) math function: shiftright
[ https://issues.apache.org/jira/browse/SPARK-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579965#comment-14579965 ] zhichao-li commented on SPARK-8224: --- I will take this one math function: shiftright - Key: SPARK-8224 URL: https://issues.apache.org/jira/browse/SPARK-8224 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a) Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, smallint and int a. Returns bigint for bigint a. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8223) math function: shiftleft
[ https://issues.apache.org/jira/browse/SPARK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579964#comment-14579964 ] zhichao-li commented on SPARK-8223: --- I will take this one math function: shiftleft Key: SPARK-8223 URL: https://issues.apache.org/jira/browse/SPARK-8223 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin shiftleft(INT a) shiftleft(BIGINT a) Bitwise left shift (as of Hive 1.2.0). Returns int for tinyint, smallint and int a. Returns bigint for bigint a. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8227) math function: unhex
[ https://issues.apache.org/jira/browse/SPARK-8227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579969#comment-14579969 ] zhichao-li commented on SPARK-8227: --- I will take this one math function: unhex Key: SPARK-8227 URL: https://issues.apache.org/jira/browse/SPARK-8227 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin unhex(STRING a): BINARY Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of the number. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8226) math function: shiftrightunsigned
[ https://issues.apache.org/jira/browse/SPARK-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579968#comment-14579968 ] zhichao-li commented on SPARK-8226: --- I will take this one math function: shiftrightunsigned - Key: SPARK-8226 URL: https://issues.apache.org/jira/browse/SPARK-8226 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a) Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, smallint and int a. Returns bigint for bigint a. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated SPARK-7119: -- Comment: was deleted (was: This workaround query can be executed correctly and there's a simple fix for this issue by the way :)) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573889#comment-14573889 ] zhichao-li commented on SPARK-7119: --- This workaround query can be executed correctly and there's a simple fix for this issue by the way :) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7119) ScriptTransform doesn't consider the output data type
[ https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573888#comment-14573888 ] zhichao-li commented on SPARK-7119: --- This workaround query can be executed correctly and there's a simple fix for this issue by the way :) ScriptTransform doesn't consider the output data type - Key: SPARK-7119 URL: https://issues.apache.org/jira/browse/SPARK-7119 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0, 1.3.1, 1.4.0 Reporter: Cheng Hao {code:sql} from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 string)) t select thing1 + 2; {code} {noformat} 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57) at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7862) Query would hang when the using script has error output in SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated SPARK-7862: -- Description: Steps to reproduce: val data = (1 to 10).map { i = (i, i, i) } data.toDF(d1, d2, d3).registerTempTable(script_trans) sql(SELECT TRANSFORM (d1, d2, d3) USING 'cat 12' AS (a,b,c) FROM script_trans) Query would hang when the using script has error output in SparkSQL --- Key: SPARK-7862 URL: https://issues.apache.org/jira/browse/SPARK-7862 Project: Spark Issue Type: Bug Components: SQL Reporter: zhichao-li Steps to reproduce: val data = (1 to 10).map { i = (i, i, i) } data.toDF(d1, d2, d3).registerTempTable(script_trans) sql(SELECT TRANSFORM (d1, d2, d3) USING 'cat 12' AS (a,b,c) FROM script_trans) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7862) Query would hang when the using script has error output in SparkSQL
zhichao-li created SPARK-7862: - Summary: Query would hang when the using script has error output in SparkSQL Key: SPARK-7862 URL: https://issues.apache.org/jira/browse/SPARK-7862 Project: Spark Issue Type: Bug Components: SQL Reporter: zhichao-li -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6897) Remove volatile from BlockingGenerator.currentBuffer
[ https://issues.apache.org/jira/browse/SPARK-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li closed SPARK-6897. - Resolution: Won't Fix May not have too much benefit for removing the volatile Remove volatile from BlockingGenerator.currentBuffer Key: SPARK-6897 URL: https://issues.apache.org/jira/browse/SPARK-6897 Project: Spark Issue Type: Improvement Components: Streaming Reporter: zhichao-li Priority: Trivial It would introduce extra performance overhead if we double use volatile and synchronized to guard the same resource(BlockingGenerator.currentBuffer). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6762) Fix potential resource leaks
zhichao-li created SPARK-6762: - Summary: Fix potential resource leaks Key: SPARK-6762 URL: https://issues.apache.org/jira/browse/SPARK-6762 Project: Spark Issue Type: Bug Components: Streaming Reporter: zhichao-li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader
[ https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated SPARK-6762: -- Summary: Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader (was: Fix potential resource leaks) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader Key: SPARK-6762 URL: https://issues.apache.org/jira/browse/SPARK-6762 Project: Spark Issue Type: Bug Components: Streaming Reporter: zhichao-li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6762) Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader
[ https://issues.apache.org/jira/browse/SPARK-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated SPARK-6762: -- Description: The close action should be placed within finally block to avoid the potential resource leaks Fix potential resource leaks in CheckPoint CheckpointWriter and CheckpointReader Key: SPARK-6762 URL: https://issues.apache.org/jira/browse/SPARK-6762 Project: Spark Issue Type: Bug Components: Streaming Reporter: zhichao-li Priority: Minor The close action should be placed within finally block to avoid the potential resource leaks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390221#comment-14390221 ] zhichao-li commented on SPARK-6613: --- [~msoutier] , have you found any solution for this ? or just report the bug? Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392102#comment-14392102 ] zhichao-li commented on SPARK-6613: --- Just trying to understand the issue but it cann't be reproduced on my side. if possible could you elaborate on how to reproduce it ? i.e. code snippet or steps. Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at
[jira] [Commented] (SPARK-6077) Multiple spark streaming tabs on UI when reuse the same sparkcontext
[ https://issues.apache.org/jira/browse/SPARK-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342595#comment-14342595 ] zhichao-li commented on SPARK-6077: --- Yeah. It would fix the SPARK-2463 as well. Almost the same case, although most of the comments on that jira is about stopping concurrently running StreamingContexts in the same JVM Multiple spark streaming tabs on UI when reuse the same sparkcontext Key: SPARK-6077 URL: https://issues.apache.org/jira/browse/SPARK-6077 Project: Spark Issue Type: Bug Components: Streaming, Web UI Reporter: zhichao-li Priority: Minor Currently we would create a new streaming tab for each streamingContext even if there's already one on the same sparkContext which would cause duplicate StreamingTab created and none of them is taking effect. snapshot: https://www.dropbox.com/s/t4gd6hqyqo0nivz/bad%20multiple%20streamings.png?dl=0 How to reproduce: 1) import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.storage.StorageLevel val ssc = new StreamingContext(sc, Seconds(1)) val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split( )) val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() . 2) ssc.stop(false) val ssc = new StreamingContext(sc, Seconds(1)) val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split( )) val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6077) Multiple spark streaming tabs on UI when reuse the same sparkcontext
zhichao-li created SPARK-6077: - Summary: Multiple spark streaming tabs on UI when reuse the same sparkcontext Key: SPARK-6077 URL: https://issues.apache.org/jira/browse/SPARK-6077 Project: Spark Issue Type: Bug Components: Streaming, Web UI Reporter: zhichao-li Priority: Minor Currently we would create a new streaming tab for each streamingContext even if there's already one on the same sparkContext which would cause duplicate StreamingTab created and none of them is taking effect. snapshot: https://www.dropbox.com/s/t4gd6hqyqo0nivz/bad%20multiple%20streamings.png?dl=0 How to reproduce: 1) import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.storage.StorageLevel val ssc = new StreamingContext(sc, Seconds(1)) val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split( )) val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() . 2) ssc.stop(false) val ssc = new StreamingContext(sc, Seconds(1)) val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split( )) val wordCounts = words.map(x = (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org