Spark 1.2.1: How to convert SchemaRDD to CassandraRDD?

2015-04-27 Thread Tash Chainar
Hi all, following the

import com.datastax.spark.connector.SelectableColumnRef;
import com.datastax.spark.connector.japi.CassandraJavaUtil;
import org.apache.spark.sql.SchemaRDD;
import static com.datastax.spark.connector.util.JavaApiHelper.toScalaSeq;
import scala.collection.Seq;

SchemaRDD schemaRDD = cassandraSQLContext.sql( select user.id as user_id
from user );
schemaRDD.cache();
schemaRDD.collect();

I want to do SELECT on schemaRDD as
SeqSelectableColumnRef columnRefs =
toScalaSeq(CassandraJavaUtil.toSelectableColumnRefs(user_id));
CassandraRDDUUID rdd = schemaRDD.select(columnRefs);

but having type mismatch at schemaRDD.select() :
incompatible types: SeqSelectableColumnRef cannot be converted to
SeqExpression

so obviously I need to convert SchemaRDD to CassandraRDD. How can it be
done?


When the old data dropped from the cache?

2015-04-20 Thread Tash Chainar
Hi all,

On https://spark.apache.org/docs/latest/programming-guide.html
under the RDD Persistence  Removing Data, it states

Spark automatically monitors cache usage on each node and drops out old
 data partitions in a least-recently-used (LRU) fashion.


 Can it be understood that the cache will be automatically refreshed with
new data. If yes when and how? How Spark determines the old data?

Regards.


ClassCastException while caching a query

2015-04-17 Thread Tash Chainar
Hi all,

Spark 1.2.1.
I have a Cassandra column family and doing the following

SchemaRDD s = cassandraSQLContext.sql(select user.id as user_id from
user);
// user.id is UUID in table definition
s.registerTempTable( my_user );
s.cache(); // throws following exception

// tried the
cassandraSQLContext.cacheTable( my_user ); // also throws the same
exception

Is there a way to resolve it. Regards.

Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task
0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException:
java.util.UUID cannot be cast to java.lang.String
at
org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala:183)
at
org.apache.spark.sql.columnar.StringColumnStats.gatherStats(ColumnStats.scala:208)
at
org.apache.spark.sql.columnar.NullableColumnBuilder$class.appendFrom(NullableColumnBuilder.scala:56)
at org.apache.spark.sql.columnar.NativeColumnBuilder.org
$apache$spark$sql$columnar$compression$CompressibleColumnBuilder$$super$appendFrom(ColumnBuilder.scala:87)
at
org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:78)
at
org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:125)
at
org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:112)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:245)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)