that substr is supported by
HiveQL, but not by Spark SQL, correct?
Thanks!
Tom
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Substring-in-Spark-SQL-tp11373.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
properties (respectively).
I guess the files are publicly available, but only to registered AWS users,
so I caved in and registered for the service. Using the credentials that I
got I was able to download the files using the local spark shell.
Thanks!
Tom
Hi Burak,
Thank you for your pointer, it is really helping out. I do have some
consecutive questions though.
After looking at the Big Data Benchmark page
https://amplab.cs.berkeley.edu/benchmark/ (Section Run this benchmark
yourself), I was expecting the following combination of files:
Sets:
],
in the amazon cluster. Is there a way I can download this without being a
user of the Amazon cluster? I tried
bin/hadoop distcp s3n://123:456@big-data-benchmark/pavlo/text/tiny/* ./
but it asks for an AWS Access Key ID and Secret Access Key which I do not
have.
Thanks in advance,
Tom
--
View
Spark gives you four of the classical collectives: broadcast, reduce,
scatter, and gather. There are also a few additional primitives, mostly
based on a join. Spark is certainly less optimized than MPI for these, but
maybe that isn't such a big deal. Spark has one theoretical disadvantage
to. But they shouldn't have
overlapped as far as both being up at the same time. Is that the case you are
seeing? Generally you want to look at why the first application attempt fails.
Tom
On Wednesday, May 21, 2014 6:10 PM, Kevin Markey kevin.mar...@oracle.com
wrote:
I tested an application on RC-10
I've done some comparisons with my own implementation of TRON on Spark.
From a distributed computing perspective, it does 2x more local work per
iteration than LBFGS, so the parallel isoefficiency is improved slightly.
I think the truncated Newton solver holds some potential because there
have
of
all node managers. Thus, this is not applicable to hosted clusters).
Tom
On Monday, May 12, 2014 9:38 AM, Sai Prasanna ansaiprasa...@gmail.com wrote:
Hi All,
I wanted to launch Spark on Yarn, interactive - yarn client mode.
With default settings of yarn-site.xml and spark-env.sh, i
either go to the RM UI
to link to the spark history UI or go directly to the spark history server ui.
Tom
On Thursday, May 1, 2014 7:09 PM, Jenny Zhao linlin200...@gmail.com wrote:
Hi,
I have installed spark 1.0 from the branch-1.0, build went fine, and I have
tried running the example
As to your last line: I've used RDD zipping to avoid GC since MyBaseData is
large and doesn't change. I think this is a very good solution to what is
being asked for.
On Mon, Apr 28, 2014 at 10:44 AM, Ian O'Connell i...@ianoconnell.com wrote:
A mutable map in an object should do what your
I'm not sure what I said came through. RDD zip is not hacky at all, as it
only depends on a user not changing the partitioning. Basically, you would
keep your losses as an RDD[Double] and zip whose with the RDD of examples,
and update the losses. You're doing a copy (and GC) on the RDD of
Right---They are zipped at each iteration.
On Mon, Apr 28, 2014 at 11:56 AM, Chester Chen chesterxgc...@yahoo.comwrote:
Tom,
Are you suggesting two RDDs, one with loss and another for the rest
info, using zip to tie them together, but do update on loss RDD (copy) ?
Chester
Sent from
Ian, I tried playing with your suggestion, but I get a task not
serializable error (and some obvious things didn't fix it). Can you get
that working?
On Mon, Apr 28, 2014 at 10:58 AM, Tom Vacek minnesota...@gmail.com wrote:
As to your last line: I've used RDD zipping to avoid GC since
to. For instance, will RDDs of the
same size usually get partitioned to the same machines - thus not
triggering any cross machine aligning, etc. We'll explore it, but I would
still very much like to see more direct worker memory management besides
RDDs.
On Mon, Apr 28, 2014 at 10:26 AM, Tom
Here are some out-of-the-box ideas: If the elements lie in a fairly small
range and/or you're willing to work with limited precision, you could use
counting sort. Moreover, you could iteratively find the median using
bisection, which would be associative and commutative. It's easy to think
of
Thomson Reuters is looking for a graduate (or possibly advanced
undergraduate) summer intern in Eagan, MN. This is a chance to work on an
innovative project exploring how big data sets can be used by professionals
such as lawyers, scientists and journalists. If you're subscribed to this
mailing
should be able to distribute the things needed to
make a recommendation (either the centroids or the attributes matrix), and
just break up the work based on the users you want to generate
recommendations for. I hope this helps.
Tom
On Sat, Apr 12, 2014 at 11:35 AM, Xiaoli Li lixiaolima
Do we have a list of things we really want to get in for 1.X? Perhaps move
any jira out to a 1.1 release if we aren't targetting them for 1.0.
It might be nice to send out reminders when these dates are approaching.
Tom
On Thursday, April 3, 2014 11:19 PM, Bhaskar Dutta bhas...@gmail.com
helped out with this prototype over Twitter’s hack week.) That work
also calls
the Scala API directly, because it was done before we had a Java API; it should
be easier
with the Java one.
Tom
On Thursday, March 6, 2014 3:11 PM, Sameer Tilak ssti...@live.com wrote:
Hi everyone,
We are using
101 - 119 of 119 matches
Mail list logo