Jakob Odersky created SPARK-12350:
-------------------------------------
Summary: VectorAssembler#transform() initially throws an exception
Key: SPARK-12350
URL: https://issues.apache.org/jira/browse/SPARK-12350
Project: Spark
Issue Type: Bug
Components: ML
Environment: sparkShell command from sbt
Reporter: Jakob Odersky
Calling VectorAssembler.transform() initially throws an exception, subsequent
calls work.
h3. Steps to reproduce
In spark-shell,
1. Create a dummy dataframe and define an assembler
{code}
import org.apache.spark.ml.feature.VectorAssembler
val df = sc.parallelize(List((1,2), (3,4))).toDF
val assembler = new VectorAssembler().setInputCols(Array("_1",
"_2")).setOutputCol("features")
{code}
2. Run
{code}
assembler.transform(df).show
{code}
Initially the following exception is thrown:
{code}
15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream
/classes/org/apache/spark/sql/catalyst/expressions/Object.class for request
from /9.72.139.102:60610
java.lang.IllegalArgumentException: requirement failed: File not found:
/classes/org/apache/spark/sql/catalyst/expressions/Object.class
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
at
org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
{code}
Subsequent calls work:
{code}
+---+---+---------+
| _1| _2| features|
+---+---+---------+
| 1| 2|[1.0,2.0]|
| 3| 4|[3.0,4.0]|
+---+---+---------+
{code}
It seems as though there is some internal state that is not initialized.
[~iyounus] originally found this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]