There is also this project
https://github.com/SciSpark/SciSpark
It might be of interest to you Christopher.
2017-12-16 3:46 GMT-05:00 Jörn Franke :
> Develop your own HadoopFileFormat and use https://spark.apache.org/
> docs/2.0.2/api/java/org/apache/spark/SparkContext.
>
-programmatically
>
> Thanks,
> Akhilesh
>
> On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Anybody? I think Rory also didn't get an answer from the list ...
>>
>> https://mail-archives.ap
Anybody? I think Rory also didn't get an answer from the list ...
https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccac+fre14pv5nvqhtbvqdc+6dkxo73odazfqslbso8f94ozo...@mail.gmail.com%3E
2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.
Hi all,
I am trying to use parquet files as input for DStream operations, but I
can't find any documentation or example. The only thing I found was [1] but
I also get the same error as in the post (Class
parquet.avro.AvroReadSupport not found).
Ideally I would like to do have something like this:
Hi Rahul,
You have probably already figured this one out, but anyway...
You need to register the classes that you'll be using with Kryo because it
does not support all Serializable types and requires you to register the
classes you’ll use in the program in advance. So when you don't register
the
We also did some benchmarking using analytical queries similar to TPC-H
both with Spark and Presto, and our conclussion was that Spark is a great
general solution but for analytical SQL queries it is still not there yet.
I mean for 10 or 100GB of data you will get your results back but with
Presto
Hi Amit,
This is very interesting indeed because I have got similar resutls. I tried
doing a filtter + groupBy using DataSet with a function, and using the
inner RDD of the DF(RDD[row]). I used the inner RDD of a DataFrame because
apparently there is no straight-forward way to create an RDD of
park is finding
> the library correctly, otherwise the error message would be "no libraryname
> found" or something like that. The problem seems to be something else, and
> I'm not sure how to find it.
>
> Thanks,
> Bernardo
>
> On 14 October 2015 at 16:28, Renato Ma
Sorry Bernardo, I just double checked. I use: System.loadLibrary();
Could you also try that?
Renato M.
2015-10-14 21:51 GMT+02:00 Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com>:
> Hi Bernardo,
>
> So is this in distributed mode? or single node? Mayb
You can also try setting the env variable LD_LIBRARY_PATH to point where
your compiled libraries are.
Renato M.
2015-10-14 21:07 GMT+02:00 Bernardo Vecchia Stein
:
> Hi Deenar,
>
> Yes, the native library is installed on all machines of the cluster. I
> tried a
Hi all,
I have some doubts about the latest SparkSQL.
1. In the paper about SparkSQL it has been stated that The physical
planner also performs rule-based physical optimizations, such as pipelining
projections or filters into one Spark map operation. ...
If dealing with a query of the form:
using rows directly:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#programmatically-specifying-the-schema
Avro or parquet input would likely give you the best performance.
On Tue, Apr 21, 2015 at 4:28 AM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Thanks
, as you are not
using a filter in SQL side.
Best
Ayan
On 21 Apr 2015 08:05, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Does anybody have an idea? a clue? a hint?
Thanks!
Renato M.
2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com:
Hi
Does anybody have an idea? a clue? a hint?
Thanks!
Renato M.
2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com:
Hi all,
I have a simple query Select * from tableX where attribute1 between 0 and
5 that I run over a Kryo file with four partitions that ends up
Hi all,
I have a simple query Select * from tableX where attribute1 between 0 and
5 that I run over a Kryo file with four partitions that ends up being
around 3.5 million rows in our case.
If I run this query by doing a simple map().filter() it takes around ~9.6
seconds but when I apply schema,
Hi all,
I am trying to understand Spark lazy evaluation works, and I need some
help. I have noticed that creating an RDD once and using it many times
won't trigger recomputation of it every time it gets used. Whereas creating
a new RDD for every time a new operation is performed will trigger
.)
On Mon, Mar 30, 2015 at 9:43 AM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Hi all,
I am trying to understand Spark lazy evaluation works, and I need some
help.
I have noticed that creating an RDD once and using it many times won't
trigger recomputation of it every time
Hi Spark experts,
Is there a way to convert a JavaSchemaRDD (for instance loaded from a
parquet file) back to a JavaRDD of a given case class? I read on
stackOverFlow[1] that I could do a select over the parquet file and then by
reflection get the fields out, but I guess that would be an
18 matches
Mail list logo