Use several jobs and orchestrate them, e.g. Via Ozzie. These jobs then can save
intermediate results to disk and load them from there. Alternatively (or
additionally!) you may use persist (to memory and disk), but I am not sure this
is suitable for such long running applications.
> On 12. May
"city": "Taipei",
>>> "localName": "Taoyuan Intl.",
>>> "airportCityState": "Taipei, Taiwan"
>>>
&g
city": "Taipei",
>>> "localName": "Taoyuan Intl.",
>>> "airportCityState": "Taipei, Taiwan"
>>>
>>>
>>>
Depends on your queries, the data structure etc. generally flat is better, but
if your query filter is on the highest level then you may have better
performance with a nested structure, but it really depends
> On 30. Apr 2017, at 10:19, Zeming Yu wrote:
>
> Hi,
>
> We're
is spilling temp file, shuffle data and
> application data ?
>
> Thanks
> Shashi
>
>> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>> You can use disk encryption as provided by the operating system.
>> Additionally, you may thi
You can use disk encryption as provided by the operating system. Additionally,
you may think about shredding disks after they are not used anymore.
> On 28. Apr 2017, at 14:45, Shashi Vishwakarma
> wrote:
>
> Hi All
>
> I was dealing with one the spark requirement
I am not sure what you try to achieve here. You should never use the arraylist
as you use it here as a global variable (an anti-pattern). Why don't you use
the count function of the dataframe?
> On 24. Apr 2017, at 19:36, Devender Yadav
> wrote:
>
> Hi All,
>
>
What is your DWH technology?
If the file is on HDFS and depending on the format than Spark can read parts of
it in parallel.
> On 21. Apr 2017, at 20:36, Paul Tremblay wrote:
>
> We are tasked with loading a big file (possibly 2TB) into a data warehouse.
> In order to
To my best knowledge, HBase works best for record around hundreds of KB and
> it requires extra work of the cluster administrator. So this would be the
> last option.
>
>
> Thanks!
>
>
>
> Mo Tao
>
> 发件人: Jörn Franke <jornfra...@gmail.com>
> 发送时间: 2
lease note that all processing will be done in Spark here. Please share your
> thoughts. Thanks again.
>
>> On Mon, Apr 17, 2017 at 12:58 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>> I think it highly depends on your requirements. There are various tools for
>>
You need to sort the data by id otherwise q situation can occur where the index
does not work. Aside from this, it sounds odd to put a 5 MB column using those
formats. This will be also not so efficient.
What is in the 5 MB binary data?
You could use HAR or maybe Hbase to store this kind of
I think it highly depends on your requirements. There are various tools for
analyzing and visualizing data. How many concurrent users do you have? What
analysis do they do? How much data is involved? Do they have to process the
data all the time or can they live with sampling which increases
I think in the end you need to check the coverage of your application. If your
application is well covered on the job or pipeline level (depends however on
how you implement these tests) then it can be fine.
In the end it really depends on the data and what kind of transformation you
As far as I am aware in newer Spark versions a DataFrame is the same as
Dataset[Row].
In fact, performance depends on so many factors, so I am not sure such a
comparison makes sense.
> On 8. Apr 2017, at 20:15, Shiyuan wrote:
>
> Hi Spark-users,
> I came across a few
Maybe using ranger or sentry would be the better choice to intercept those
calls?
> On 7. Apr 2017, at 16:32, Alvaro Brandon wrote:
>
> I was going through the SparkContext.textFile() and I was wondering at that
> point does Spark communicates with HDFS. Since when
How do you read them?
> On 7. Apr 2017, at 12:11, Jacek Laskowski wrote:
>
> Hi,
>
> If your Spark app uses snappy in the code, define an appropriate library
> dependency to have it on classpath. Don't rely on transitive dependencies.
>
> Jacek
>
> On 7 Apr 2017 8:34
I do think this is the right way, you will have to do testing with test data
verifying that the expected output of the calculation is the output.
Even if the logical Plan Is correct your calculation might not be. E.g. There
can be bugs in Spark, in the UI or (what is very often) the client
And which version does your Spark cluster use?
> On 6. Apr 2017, at 16:11, nayan sharma <nayansharm...@gmail.com> wrote:
>
> scalaVersion := “2.10.5"
>
>
>
>
>> On 06-Apr-2017, at 7:35 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
&
pration-assembly-1.0.jar | grep csv
>>
>> after doing this I have found a lot of classes under
>> com/databricks/spark/csv/
>>
>> do I need to check for any specific class ??
>>
>> Regards,
>> Nayan
>>> On 06-Apr-2017, at 6:42 PM, Jörn Franke &
Is the library in your assembly jar?
> On 6. Apr 2017, at 15:06, nayan sharma wrote:
>
> Hi All,
> I am getting error while loading CSV file.
>
> val
> datacsv=sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").load("timeline.csv")
>
If you trust that your delta file is correct then this might be the way
forward. You just have to keep in mind that sometimes you can have several
delta files in parallel and you need to apply then in the correct order or
otherwise a deleted row might reappear again. Things get more messy if a
You can always repartition, but maybe for your use case different rdds with the
same data, but different partition strategies could make sense. It may also
make sense to choose an appropriate format on disc (orc, parquet). You have to
choose based also on the users' non-functional requirements.
$BUILD_SBT_FILE << !
>>> > lazy val root = (project in file(".")).
>>> > settings(
>>> > name := "${APPLICATION}",
>>> > version := "1.0",
>>> > scalaVersion := "2.11.8",
&
Usually you define the dependencies to the Spark library as provided. You also
seem to mix different Spark versions which should be avoided.
The Hadoop library seems to be outdated and should also only be provided.
The other dependencies you could assemble in a fat jar.
> On 27 Mar 2017, at
What do you mean by clear ? What is the use case?
> On 23 Mar 2017, at 10:16, nayan sharma wrote:
>
> Does Spark clears the persisted RDD in case if the task fails ?
>
> Regards,
>
> Nayan
DO Auto-generated method stub
> return null;
> }
>
> }
>
> which fails too...
>
> java.lang.NullPointerException
> at org.apache.spark.sql.execution.datasources.LogicalRelation.(
> LogicalRelation.scala:40)
> at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(
> SparkSession.sc
I think you can develop a Spark data source in Java, but you are right most use
for the glue Spark even if they have a Java library (this is what I did for the
project I open sourced). Coming back to your question, it is a little bit
difficult to assess the exact issue without the code.
You
Hi,
The Spark CSV parser has different parsing modes:
* permissive (default) tries to read everything and missing tokens are
interpreted as null and extra tokens are ignored
* dropmalformed drops lines which have more or less tokens
* failfast - runtimexception if there is a malformed line
this with ease, I was just wondering
>> what people are using.
>>
>> Jenkins seems to have the best spark plugins so we are investigating that as
>> well as a variety of other hosted CI tools
>>
>> Happy to write a blog post detailing our findings and sharing i
Hi,
Jenkins also now supports pipeline as code and multibranch pipelines. thus you
are not so dependent on the UI and you do not need anymore a long list of jobs
for different branches. Additionally it has a new UI (beta) called blueocean,
which is a little bit nicer. You may also check GoCD.
I find this question strange. There is no best tool for every use case. For
example, both tools mentioned below are suitable for different purposes,
sometimes also complementary.
> On 9 Mar 2017, at 20:37, Gaurav1809 wrote:
>
> Hi All, Would you please let me know
You seem to generate always a new rdd instead of reusing the existing. So I
does not seem surprising that the memory need is growing.
> On 9 Mar 2017, at 15:24, Facundo Domínguez wrote:
>
> Hello,
>
> Some heap profiling shows that memory grows under a TaskMetrics
This depends on your target setup! I run for example for my open source
libraries for spark integration tests (a dedicated folder a side the unit
tests) a local spark master, but also use a minidfs cluster (to simulate HDFS
on a node) and sometimes also a miniyarn cluster (see
Can you provide some source code? I am not sure I understood the problem .
If you want to do a preprocessing at the JDBC datasource then you can write
your own data source. Additionally you may want to modify the sql statement to
extract the data in the right format and push some preprocessing
I agree with the others that a dedicated NoSQL datastore can make sense. You
should look at the lambda architecture paradigm. Keep in mind that more memory
does not necessarily mean more performance. It is the right data structure for
the queries of your users. Additionally, if your queries
I think this highly depends on the risk that you want to be exposed to. If you
have it on dedicated nodes there is less influence of other processes.
I have seen both: on Hadoop nodes or dedicated. On Hadoop I would not recommend
to put it on data nodes/heavily utilized nodes.
Zookeeper does
I am not sure that Spark Streaming is what you want to do. It is for streaming
analytics not for loading in a DWH.
You need also define what realtime means and what is needed there - it will
differ from client to client significantly.
From my experience, just SQL is not enough for the users
You do not need to place it in every local directory of every node. Just use
hadoop fs -put to put it on HDFS. Alternatively as others suggested use s3
> On 28 Feb 2017, at 02:18, Yunjie Ji wrote:
>
> After start the dfs, yarn and spark, I run these code under the root
>
What do you want to do with the event log ? The Hadoop command line can show
compressed files (hadoop fs -text). Alternatively there are tools depending on
your os ... you can also write a small job to do this and run it on the cluster.
> On 15 Feb 2017, at 10:55, satishl
Can you check in the UI which tasks took most of the time?
Even the 45 min looks a little bit much given that you only work most of the
time with 50k rows
> On 15 Feb 2017, at 00:03, Timur Shenkao wrote:
>
> Hello,
> I'm not sure that's your reason but check this
> successful writing of the new one.
> Thanks,
> Assaf.
>
>
> From: Steve Loughran [mailto:ste...@hortonworks.com]
> Sent: Tuesday, February 14, 2017 3:25 PM
> To: Mendelson, Assaf
> Cc: Jörn Franke; user
> Subject: Re: fault tolerant dataframe write with overwrite
Normally you can fetch the filesystem interface from the configuration ( I
assume you mean URI).
Managing to get the last iteration: I do not understand the issue. You can have
as the directory the current timestamp and at the end you simply select the
directory with the highest number.
Well 1) the goal of wholetextfiles is to have only one executor 2) you use .gz
i.e. you will have only one executor per file maximum
> On 14 Feb 2017, at 09:36, Henry Tremblay wrote:
>
> When I use wholeTextFiles, spark does not run in parallel, and yarn runs out
> of
Your vendor should use the parquet internal compression and not take a parquet
file and gzip it.
> On 13 Feb 2017, at 18:48, Benjamin Kim wrote:
>
> We are receiving files from an outside vendor who creates a Parquet data file
> and Gzips it before delivery. Does anyone
Cf. also https://spark.apache.org/docs/latest/job-scheduling.html
> On 12 Feb 2017, at 11:30, Jörn Franke <jornfra...@gmail.com> wrote:
>
> I think you should have a look at the spark documentation. It has something
> called scheduler who does exactly this. In more sophisti
.
> On 12 Feb 2017, at 11:45, Sean Owen <so...@cloudera.com> wrote:
>
> No this use case is perfectly sensible. Yes it is thread safe.
>
>> On Sun, Feb 12, 2017, 10:30 Jörn Franke <jornfra...@gmail.com> wrote:
>> I think you should have a look at the spark docume
sistent result. So my question is, what, if any are the legal
> operations to use on a dataframe so I could do the above.
>
> Thanks,
> Assaf.
>
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Sunday, February 12, 2017 10:39 AM
> To: Men
You're have to carefully choose if your strategy makes sense given your users
workloads. Hence, I am not sure your reasoning makes sense.
However, You can , for example, install openstack swift as an object store and
use this as storage. HDFS in this case can be used as a temporary store
I am not sure what you are trying to achieve here. Spark is taking care of
executing the transformations in a distributed fashion. This means you must not
use threads - it does not make sense. Hence, you do not find documentation
about it.
> On 12 Feb 2017, at 09:06, Mendelson, Assaf
Can you post more information about the number of files, their size and the
executor logs.
A gzipped file is not splittable i.e. Only one executor can gunzip it (the
unzipped data can then be processed in parallel).
Wholetextfile was designed to be executed only on one executor (e.g. For
The resource management in yarn cluster mode is yarns task. So it dependents
how you configured the queues and the scheduler there.
> On 8 Feb 2017, at 12:10, Cosmin Posteuca wrote:
>
> I tried to run some test on EMR on yarn cluster mode.
>
> I have a cluster with
Depends on the use case, but a persist before checkpointing can make sense
after some of the map steps.
> On 8 Feb 2017, at 03:09, Shushant Arora wrote:
>
> Hi
>
> I have a workflow like below:
>
> rdd1 = sc.textFile(input);
> rdd2 = rdd1.filter(filterfunc1);
>
If you want to run them always on the same machines use yarn node labels. If it
is any 10 machines then use capacity or fair scheduler.
What is the use case for running it always on the same 10 machines. If it is
for licensing reasons then I would ask your vendor if this is a suitable mean
to
hnical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>> On 29 January 2017 at 22:22, Jörn Franke <jornfra...@gmail.com> wrote:
>> You can use HDFS, S3, Azure,
Not sure what your udf is exactly doing, but why not on udf / type ? You avoid
issues converting it, it is more obvious for the user of your udf etc
You could of course return a complex type with one long, one string and one
double and you fill them in the udf as needed, but this would be
There are many performance aspects here which may not only related to the UDF
itself, but on configuration of platform, data etc.
You seem to have a performance problem with your UDFs. Maybe you can elaborate
on
1) what data you process (format, etc)
2) what you try to Analyse
3) how you
able for any monetary damages arising from such
> loss, damage or destruction.
>
>
>> On 30 January 2017 at 21:51, Jörn Franke <jornfra...@gmail.com> wrote:
>> Depending on the size of the data i recommend to schedule regularly an
>> extract in tableau. There
Depending on the size of the data i recommend to schedule regularly an extract
in tableau. There tableau converts it to an internal in-memory representation
outside of Spark (can also exist on disk if memory is too small) and then use
it within Tableau. Accessing directly the database is not
I meant with distributed file system such as Ceph, Gluster etc...
> On 29 Jan 2017, at 14:45, Jörn Franke <jornfra...@gmail.com> wrote:
>
> One alternative could be the oracle Hadoop loader and other Oracle products,
> but you have to invest some money and probably buy thei
One alternative could be the oracle Hadoop loader and other Oracle products,
but you have to invest some money and probably buy their Hadoop Appliance,
which you have to evaluate if it make sense (can get expensive with large
clusters etc).
Another alternative would be to get rid of Oracle
Hard to tell. Can you give more insights on what you try to achieve and what
the data is about?
For example, depending on your use case sqoop can make sense or not.
> On 28 Jan 2017, at 02:14, Sirisha Cheruvu wrote:
>
> Hi Team,
>
> RIght now our existing flow is
>
>
Sorry the message was not complete: the key is the file position, so if you
sort by key the lines will be in the same order as in the original file
> On 27 Jan 2017, at 14:45, Jörn Franke <jornfra...@gmail.com> wrote:
>
> I agree with the previous statements. You cannot expe
I agree with the previous statements. You cannot expect any ordering guarantee.
This means you need to ensure that the same ordering is done as the original
file. Internally Spark is using the Hadoop Client libraries - even if you do
not have Hadoop installed, because it is a flexible
know if we can run this from
> within our local machine? given that all the required jars are downloaded by
> SBT anyways.
>
>> On 20 January 2017 at 11:22, Jörn Franke <jornfra...@gmail.com> wrote:
>> It is only on pairdd
>>
>>> On 20 Jan 2017, at 11:54,
It is only on pairdd
> On 20 Jan 2017, at 11:54, A Shaikh wrote:
>
> Has anyone experience saving Dataset to Bigquery Table?
>
> I am loading into BigQuery using the following example sucessfully. This uses
> RDD.saveAsNewAPIHadoopDataset method to save data.
> I am
You can use zeppelin if you want to directly interact with Spark. For
traditional tools you have the right ideas (any of them works depending on
requirements)
See also lambda architecture
> On 20 Jan 2017, at 08:18, Gaurav1809 wrote:
>
> Hi All,
>
>
> Once data is
You run compaction, i.e. save the modified/deleted records in a dedicated file.
Every now and then you compare the original and delta file and create a new
version. When querying before compaction then you need to check in original and
delta file. I don to think orc need tez for it , but it
Hallo,
I am not sure what you mean by min/max for strings. I do not know if this makes
sense. What the ORC format has is bloom filters for strings etc. - are you
referring to this?
In order to apply min/max filters Spark needs to read the meta data of the
file. If the filter is applied or
Avro itself supports it, but I am not sure if this functionality is available
through the Spark API. Just out of curiosity, if your use case is only write to
HDFS then you might use simply flume.
> On 9 Jan 2017, at 09:58, awkysam wrote:
>
> Currently for our
Firewall Ports open?
Hint: for security reasons you should not connect via the internet.
> On 9 Jan 2017, at 04:30, Raymond Xie wrote:
>
> I want to do some data analytics work by leveraging Databricks spark platform
> and connect my Tableau desktop to it for data
Why not upgrade to ojdbc7 - this one is for java 7+8? Keep in mind that the
jdbc driver is updated constantly (although simply called ojdbc7). I would be
surprised if this does not work with cloudera as it runs on the oracle big data
appliance.
> On 22 Dec 2016, at 21:44, Mich Talebzadeh
I am currently developing one https://github.com/ZuInnoTe/hadoopoffice
It contains working source code, but a release will likely be only beginning of
the year (will include a Spark data source, but the existing source code can be
used without issues in a Spark application).
> On 19 Dec 2016,
In Hadoop you should not have many small files. Put them into a HAR.
> On 13 Dec 2016, at 05:42, Jakob Odersky wrote:
>
> Assuming the bottleneck is IO, you could try saving your files to
> HDFS. This will distribute your data and allow for better concurrent
> reads.
>
>> On
Tar is not out of the box supported. Just store the file as .json.bz2 without
using tar.
> On 8 Dec 2016, at 20:18, Maurin Lenglart wrote:
>
> Hi,
> I am trying to load a json file compress in .tar.bz2 but spark throw an error.
> I am using pyspark with spark 1.6.2.
You need to do the book keeping of what has been processed yourself. This may
mean roughly the following (of course the devil is in the details):
Write down in zookeeper which part of the processing job has been done and for
which dataset all the data has been created (do not keep the data
If you do it frequently then you may simply copy the data to the processing
cluster. Alternatively, you could create an external table in the processing
cluster to the analytics cluster. However, this has to be supported by
appropriate security configuration and might be less an efficient then
I am not sure what use case you want to demonstrate with select count in
general. Maybe you can elaborate more what your use case is.
Aside from this: this is a Cassandra issue. What is the setup of Cassandra?
Dedicated nodes? How many? Replication strategy? Consistency configuration? How
is
Use as a format orc, parquet or avro because they support any compression type
with parallel processing. Alternatively split your file in several smaller
ones. Another alternative would be bzip2 (but slower in general) or Lzo
(usually it is not included by default in many distributions).
> On
Once you configured a custom file system in Hadoop it can be used by Spark out
of the box. Depending what you implement in the custom file system you may
think about side effects to any application including spark (memory consumption
etc).
> On 21 Nov 2016, at 18:26, Samy Dindane
You can do the conversion of character set (is this the issue?) as part of your
loading process in Spark.
As far as i know the spark CSV package is based on Hadoop TextFileInputformat.
This format to my best of knowledge supports only utf-8. So you have to do a
conversion from windows to utf-8.
spark version? Are you using tungsten?
> On 14 Nov 2016, at 10:05, Prithish wrote:
>
> Can someone please explain why this happens?
>
> When I read a 600kb AVRO file and cache this in memory (using cacheTable), it
> shows up as 11mb (storage tab in Spark UI). I have tried
What is wrong with the good old batch transfer for transferring data from a
cluster to another? I assume your use case is only business continuity in
case of disasters such as data center loss, which are unlikely to happen
(well it does not mean they do not happen) and where you could afford to
Can you split the files beforehand in several files (e.g. By the column you do
the join on?) ?
> On 10 Nov 2016, at 23:45, Stuart White wrote:
>
> I have a large "master" file (~700m records) that I frequently join smaller
> "transaction" files to. (The transaction
Basically you mention the options. However, there are several ways how
informatica can extract (or store) from/to rdbms. If the native option is not
available then you need to go via JDBC as you have described.
Alternatively (but only if it is worth it) you can schedule fetching of the
files
his email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>> On 24 October 2016 at 17:09, Jörn Franke <jornfra...@gmail.com> wrote:
>> Bigtop contain
Bigtop contains a random data generator mainly for transactions, but it could
be rather easily adapted
> On 24 Oct 2016, at 18:04, Mich Talebzadeh wrote:
>
> me being lazy
>
> Does anyone have a library to create an array of random numbers from normal
>
Have you verified that this class is in the fat jar? It looks that it misses
some of the Hbase libraries ...
> On 21 Oct 2016, at 11:45, Mich Talebzadeh wrote:
>
> Still does not work with Spark 2.0.0 on apache-phoenix-4.8.1-HBase-1.2-bin
>
> thanks
>
> Dr Mich
What is the use case of this? You will reduce performance significantly.
Nevertheless, the way you propose is the way to go, but I do not recommend it.
> On 20 Oct 2016, at 14:00, Ashan Taha wrote:
>
> Hi
>
> What’s the best way to make sure an Avro file is NOT Splitable
Careful Hbase with Phoenix is only in certain scenarios faster. When it is
about processing small amounts out of a bigger amount of data (depends on node
memory, the operation etc). Hive+tez+orc can be rather competitive, llap
makes sense for interactive ad-hoc queries that are rather
stributed NoSQL engine.
> Remember Big Data isn’t relational its more of a hierarchy model or record
> model. Think IMS or Pick (Dick Pick’s revelation, U2, Universe, etc …)
>
>
>> On Oct 17, 2016, at 3:45 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>
gt; required. This is also goes for the REST Endpoints. 3rd party services will
>>> hit ours to update our data with no need to read from our data. And, when
>>> we want to update their data, we will hit theirs to update their data using
>>> a triggered job.
>>>
&g
Which Spark version?
Are you using RDDs? Or datasets?
What type are the features? If string how large?
Is it spark standalone?
How do you train/configure the algorithm. How do you initially parse the data?
The standard driver and executor logs could be helpful.
> On 12 Oct 2016, at 09:24, 陈哲
Your file size is too small this has a significant impact on the namenode. Use
Hbase or maybe hawq to store small writes.
> On 10 Oct 2016, at 16:25, Kevin Mellott wrote:
>
> Whilst working on this application, I found a setting that drastically
> improved the
Use a tool like flume and/or oozie to reliable download files from http and do
error handling (e.g. Requests time out). Afterwards process the data with spark.
> On 25 Sep 2016, at 10:27, Dan Bikle wrote:
>
> hello spark-world,
>
> How to use Spark-Scala to download a CSV
As Cody said, Spark is not going to help you here.
There are two issues you need to look at here: duplicated (or even more)
messages processed by two different processes and the case of failure of any
component (including the message broker). Keep in mind that duplicated messages
can even
Depends what your use case is. A generic benchmark does not make sense, because
they are different technologies for different purposes.
> On 23 Sep 2016, at 06:09, ayan guha wrote:
>
> Hi
>
> Is there any benchmark or point of view in terms of pros and cons between AWS
>
will kill the process because it's using more
>> >> memory than it asked for. A JVM is always going to use a little
>> >> off-heap memory by itself, so setting a max heap size of 2GB means the
>> >> JVM process may use a bit more than 2GB of memory. With an off-heap
>>
tensive app like Spark it can be a lot more.
>
> There's a built-in 10% overhead, so that if you ask for a 3GB executor
> it will ask for 3.3GB from YARN. You can increase the overhead.
>
> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke <jornfra...@gmail.com>
> wrote:
> &
You should take also into account that spark has different option to represent
data in-memory, such as Java serialized objects, Kyro serialized, Tungsten
(columnar optionally compressed) etc. the tungsten thing depends heavily on the
underlying data and sorting especially if compressed.
Then,
All off-heap memory is still managed by the JVM process. If you limit the
memory of this process then you limit the memory. I think the memory of the JVM
process could be limited via the xms/xmx parameter of the JVM. This can be
configured via spark options for yarn (be aware that they are
201 - 300 of 509 matches
Mail list logo