Hi Ravi,
Welcome, you probably want RDD.saveAsTextFile(“hdfs:///my_file”)
Chris
On Jun 22, 2015, at 5:28 PM, ravi tella ddpis...@gmail.com wrote:
Hello All,
I am new to Spark. I have a very basic question.How do I write the output of
an action on a RDD to HDFS?
Thanks in advance
for the quick reply and the welcome. I am trying to read a file from
hdfs and then writing back just the first line to hdfs.
I calling first() on the RDD to get the first line.
Sent from my iPhone
On Jun 22, 2015, at 7:42 PM, Chris Gore cdg...@cdgore.com wrote:
Hi Ravi,
Welcome, you
I tried running this data set as described with my own implementation of L2
regularized logistic regression using LBFGS to compare:
https://github.com/cdgore/fitbox https://github.com/cdgore/fitbox
Intercept: -0.886745823033
Weights (['gre', 'gpa', 'rank']):[ 0.28862268 0.19402388 -0.36637964]
Good to hear there will be partitioning support. I’ve had some success loading
partitioned data specified with Unix glowing format. i.e.:
sc.textFile(s3:/bucket/directory/dt=2014-11-{2[4-9],30}T00-00-00”)
would load dates 2014-11-24 through 2014-11-30. Not the most ideal solution,
but it
Hi Sameer,
MLLib uses Breeze’s vector format under the hood. You can use that.
http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
For example:
import breeze.linalg.{DenseVector = BDV, SparseVector = BSV, Vector = BV}
val numClasses = classes.distinct.count.toInt
val
`Vectors.sparse`:
val sv = Vectors.sparse(numProducts, productIds.map(x = (x, 1.0)))
where numProducts should be the largest product id plus one.
Best,
Xiangrui
On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore cdg...@cdgore.com wrote:
Hi Sameer,
MLLib uses Breeze’s vector format under the hood
There is support for Spark in ElasticSearch’s Hadoop integration package.
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
Maybe you could split and insert all of your documents from Spark and then
query for “MoreLikeThis” on the ElasticSearch index. I haven’t
Hi Chris,
I've encountered this error when running Spark’s ALS methods too. In my case,
it was because I set spark.local.dir improperly, and every time there was a
shuffle, it would spill many GB of data onto the local drive. What fixed it
was setting it to use the /mnt directory, where a
We'd love to see a Spark user group in Los Angeles and connect with others
working with it here.
Ping me if you're in the LA area and use Spark at your company (
ch...@retentionscience.com ).
Chris
Retention Science
call: 734.272.3099
visit: Site | like: Facebook | follow: Twitter
On Mar