1. make sure your secret key doesn't have a / in it. If it does, generate a
new key.
2. jets3t and hadoop JAR versions need to be in sync; jets3t 0.9.0 was picked
up in Hadoop 2.4 and not AFAIK
3. Hadoop 2.6 has a new S3 client, s3a, which compatible with s3n data. It
uses the AWS toolkit
On 24 Mar 2015, at 02:10, Marcelo Vanzin van...@cloudera.com wrote:
This happens most probably because the Spark 1.3 you have downloaded
is built against an older version of the Hadoop libraries than those
used by CDH, and those libraries cannot parse the container IDs
generated by CDH.
On 25 Mar 2015, at 21:54, roni
roni.epi...@gmail.commailto:roni.epi...@gmail.com wrote:
Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new one.
But when I start teh spark-shell I get -
Note that even the Facebook four degrees of separation paper went down to a
single machine running WebGraph (http://webgraph.di.unimi.it/) for the final
steps, after running jobs in there Hadoop cluster to build the dataset for that
final operation.
The computations were performed on a
On 30 Mar 2015, at 13:27, jay vyas
jayunit100.apa...@gmail.commailto:jayunit100.apa...@gmail.com wrote:
Just the same as spark was disrupting the hadoop ecosystem by changing the
assumption that you can't rely on memory in distributed analytics...now maybe
we are challenging the assumption
It's worth adding that there's no guaranteed that re-evaluated work would be on
the same host as before, and in the case of node failure, it is not guaranteed
to be elsewhere.
this means things that depend on host-local information is going to generate
different numbers even if there are no
On 21 Apr 2015, at 17:34, Richard Marscher
rmarsc...@localytics.commailto:rmarsc...@localytics.com wrote:
- There are System.exit calls built into Spark as of now that could kill your
running JVM. We have shadowed some of the most offensive bits within our own
application to work around this.
S3a isn't ready for production use on anything below Hadoop 2.7.0. I say that
as the person who mentored in all the patches for it between Hadoop 2.6 2.7
you need everything in https://issues.apache.org/jira/browse/HADOOP-11571 in
your code
-Hadoop 2.6.0 doesn't have any of the HADOOP-11571
the key thing would be to use different ZK paths for each cluster. You
shouldn't need more than 2 ZK quorums even for a large (few thousand node)
Hadoop clusters: one for the HA bits of the infrastructure (HDFS, YARN) and one
for the applications to abuse. It's easy for apps using ZK to stick
This a hadoop-side stack trace
it looks like the code is trying to get the filesystem permissions by running
%HADOOP_HOME%\bin\WINUTILS.EXE ls -F
and something is triggering a null pointer exception.
There isn't any HADOOP- JIRA with this specific stack trace in it, so it's not
a
On 27 Apr 2015, at 07:51, ÐΞ€ρ@Ҝ (๏̯͡๏)
deepuj...@gmail.commailto:deepuj...@gmail.com wrote:
Spark 1.3
1. View stderr/stdout from executor from Web UI: when the job is running i
figured out the executor that am suppose to see, and those two links show 4
special characters on browser.
2.
On 19 May 2015, at 03:08, Justin Pihony justin.pih...@gmail.com wrote:
15/05/18 22:03:14 INFO Executor: Fetching
http://192.168.56.1:49752/jars/twitter4j-media-support-3.0.3.jar with
timestamp 1432000973058
15/05/18 22:03:14 INFO Utils: Fetching
I think you may want to try emailing things to the storm users list, not the
spark one
On 11 May 2015, at 15:42, Tyler Mitchell
tyler.mitch...@actian.commailto:tyler.mitch...@actian.com wrote:
I've had good success with splunk generator.
https://github.com/coccyx/eventgen/blob/master/README.md
On 16 May 2015, at 04:39, Anton Brazhnyk
anton.brazh...@genesys.commailto:anton.brazh...@genesys.com wrote:
For me it wouldn’t help I guess, because those newer classes would still be
loaded by different classloader.
What did work for me with 1.3.1 – removing of those classes from Spark’s jar
On 15 May 2015, at 21:20, Mohammad Tariq donta...@gmail.com wrote:
Thank you Ayan and Ted for the prompt response. It isn't working with s3n
either.
And I am able to download the file. In fact I am able to read the same file
using s3 API without any issue.
sounds like an S3n
On 10 Apr 2015, at 13:40, Lorenz Knies m...@l1024.org wrote:
i would consider it a bug, that the Yarn application state monitor” thread
dies on an, i think even expected (at least in the java methods called
further down the stack), exception.
What do you think? Is it a problem, that we
On 6 Apr 2015, at 23:05, Patrick Young
patrick.mckendree.yo...@gmail.commailto:patrick.mckendree.yo...@gmail.com
wrote:
does anyone have any thoughts on storing a really large raster in HDFS? Seems
like if I just dump the image into HDFS as it, it'll get stored in blocks all
across the
This means the spark workers exited with code 15; probably nothing YARN
related itself (unless there are classpath-related problems).
Have a look at the logs of the app/container via the resource manager. You can
also increase the time that logs get kept on the nodes themselves to something
On 5 Jun 2015, at 08:03, Pierre B pierre.borckm...@realimpactanalytics.com
wrote:
Hi list!
My problem is quite simple.
I need to access several S3 buckets, using different credentials.:
```
val c1 =
sc.textFile(s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv).count
val c2
On 2 Jun 2015, at 00:14, Dean Wampler
deanwamp...@gmail.commailto:deanwamp...@gmail.com wrote:
It would be nice to see the code for MapR FS Java API, but my google foo failed
me (assuming it's open source)...
I know that MapRFS is closed source, don't know about the java JAR. Why not ask
On 4 Jun 2015, at 15:59, Chao Chen kandy...@gmail.com wrote:
But when I try to run the Pagerank from HiBench, it always cause a node to
reboot during the middle of the work for all scala, java, and python
versions. But works fine
with the MapReduce version from the same benchmark.
do
On 8 Jun 2015, at 15:55, Richard Marscher
rmarsc...@localytics.commailto:rmarsc...@localytics.com wrote:
Hi,
we've been seeing occasional issues in production with the FileOutCommitter
reaching a deadlock situation.
We are writing our data to S3 and currently have speculation enabled. What
On 20 Jun 2015, at 17:37, Ashish Soni asoni.le...@gmail.com wrote:
Can any one help i am getting below error when i try to start the History
Server
I do not see any org.apache.spark.deploy.yarn.history.pakage inside the
assembly jar not sure how to get that
On 19 Jun 2015, at 16:48, Sea 261810...@qq.commailto:261810...@qq.com wrote:
Hi, all:
I run spark on yarn, I want to see the Jobs UI http://ip:4040/,
but it redirect to http://${yarn.ip}/proxy/application_1428110196022_924324/
which can not be found. Why?
Anyone can help?
whenever you point
On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote:
Hi,
I am running this on Spark stand-alone mode. I find that when I examine the
web UI, a couple bugs arise:
1. There is a discrepancy between the number denoting the duration of the
application when I run the history server
On 22 Jun 2015, at 04:08, Shawn Garbett
shawn.garb...@gmail.commailto:shawn.garb...@gmail.com wrote:
2015-06-21 11:03:22,029 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=39288,containerID=container_1434751301309_0015_02_01]
you are using a guava version on the classpath which your version of Hadoop
can't handle. try a version 15 or build spark against Hadoop 2.7.0
On 24 Jun 2015, at 19:03, maxdml max...@cs.duke.edu wrote:
Exception in thread main java.lang.NoSuchMethodError:
On 26 Jun 2015, at 09:29, Ashic Mahtab as...@live.commailto:as...@live.com
wrote:
Thanks for the replies, guys.
Is this a permanent change as of 1.3, or will it go away at some point?
Don't blame the spark team, complain to the hadoop team for being slow to
embrace the java 1.7 APIs for
On 24 Jun 2015, at 05:55, canan chen
ccn...@gmail.commailto:ccn...@gmail.com wrote:
Why do you want it start until all the resources are ready ? Make it start as
early as possible should make it complete earlier and increase the utilization
of resources
On Tue, Jun 23, 2015 at 10:34 PM, Arun
On 23 Jun 2015, at 00:09, Danny kont...@dannylinden.de wrote:
hi,
have you tested
s3://ww-sandbox/name_of_path/ instead of s3://ww-sandbox/name_of_path
+ make sure the bucket is there already. Hadoop s3 clients don't currently
handle that step
or have you test to add your file
That's the Tachyon FS there, which appears to be missing a method override.
On 12 Jun 2015, at 19:58, Peter Haumer
phau...@us.ibm.commailto:phau...@us.ibm.com wrote:
Exception in thread main java.lang.UnsupportedOperationException: Not
implemented by the TFS FileSystem implementation
at
These are both really good posts: you should try and get them in to the
documentation.
with anything implementing dynamicness, there are some fun problems
(a) detecting the delays in the workflow. There's some good ideas here
(b) deciding where to address it. That means you need to monitor the
On 15 Jun 2015, at 15:43, Borja Garrido Bear
kazebo...@gmail.commailto:kazebo...@gmail.com wrote:
I tried running the job in a standalone cluster and I'm getting this:
java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException:
For that you need SPARK-1537 and the patch to go with it
It is still the spark web UI, it just hands off storage and retrieval of the
history to the underlying Yarn timeline server, rather than through the
filesystem. You'll get to see things as they go along too.
If you do want to try it,
s3a uses amazon's own libraries; it's tested against frankfurt too.
you have to view s3a support in Hadoop 2.6 as beta-release: it works, with some
issues. Hadoop 2.7.0+ has it all working now, though are left with the task of
getting hadoop-aws and the amazon JAR onto your classpath via the
On 29 Jun 2015, at 14:18, Dave Ariens
dari...@blackberry.commailto:dari...@blackberry.com wrote:
I'd like to toss out another idea that doesn't involve a complete end-to-end
Kerberos implementation. Essentially, have the driver authenticate to
Kerberos, instantiate a Hadoop file system, and
On 27 Jun 2015, at 07:56, Tim Chen
t...@mesosphere.iomailto:t...@mesosphere.io wrote:
Does YARN provide the token through that env variable you mentioned? Or how
does YARN do this?
Roughly:
1. client-side launcher creates the delegation tokens and adds them as byte[]
data to the the
On Thu, Jul 2, 2015 at 7:38 AM, Daniel Haviv
daniel.ha...@veracity-group.commailto:daniel.ha...@veracity-group.com wrote:
Hi,
I'm trying to start the thrift-server and passing it azure's blob storage jars
but I'm failing on :
Caused by: java.io.IOException: No FileSystem for scheme: wasb
On 24 Jun 2015, at 18:56, Kevin Liu kevin...@fb.commailto:kevin...@fb.com
wrote:
Continuing this thread beyond standalone - onto clusters, does anyone have
experience successfully running any Spark cluster on IPv6 only (not dual stack)
machines? More companies are moving to IPv6 and some such
That's spark on YARN in Kerberos
In Spark 1.3 you can submit work to a Kerberized Hadoop cluster; once the
tokens you passed up with your app submission expire (~72 hours) your job can't
access HDFS any more.
That's been addressed in Spark 1.4, where you can now specify a kerberos keytab
for
with the right ftp client JAR on your classpath (I forget which), you can use
ftp:// a a source for a hadoop FS operation. you may even be able to use it as
an input for some spark (non streaming job directly.
On 14 Aug 2015, at 14:11, Varadhan, Jawahar
there's a spark-submit.cmd file for windows. Does that work?
On 27 Jul 2015, at 21:19, Proust GZ Feng
pf...@cn.ibm.commailto:pf...@cn.ibm.com wrote:
Hi, Spark Users
Looks like Spark 1.4.0 cannot work with Cygwin due to the removing of Cygwin
support in bin/spark-class
The changeset is
` and
`double` were now types, and UNION had evolved.)
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Tuesday, August 4, 2015 11:53 PM
To: Steve Loughran ste...@hortonworks.commailto:ste...@hortonworks.com
Cc: Ishwardeep Singh
ishwardeep.si...@impetus.co.inmailto:ishwardeep.si
, Steve Loughran ste...@hortonworks.com wrote:
Think it may be needed on Windows, certainly if you start trying to work
with local files.
On 4 Aug 2015, at 00:34, Sean Owen so...@cloudera.com wrote:
It won't affect you if you're not actually running Hadoop. But it's
mainly things like Snappy
the reason that redirect is there is for security reasons; in a kerberos
enabled cluster the RM proxy does the authentication, then forwards the
requests to the running application. There's no obvious way to disable it in
the spark application master, and I wouldn't recommend doing this anyway,
On 3 Aug 2015, at 10:05, MrJew kouz...@gmail.com wrote:
Hello,
Similar to other cluster systems e.g Zookeeper,
Actually, Zookeeper supports SASL authentication of your Kerberos tokens.
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL
Hazelcast. Spark has the
On 1 Aug 2015, at 18:26, Ruslan Dautkhanov
dautkha...@gmail.commailto:dautkha...@gmail.com wrote:
If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000)
may increase bandwidth up to ~20%.
On 2 Aug 2015, at 13:42, Sujit Pal
sujitatgt...@gmail.commailto:sujitatgt...@gmail.com wrote:
There is no additional configuration on the external Solr host from my code, I
am using the default HttpClient provided by HttpSolrServer. According to the
Javadocs, you can pass in a HttpClient
you need to fix your configuration so that the resource manager hostname/URL is
set...that address there is the listen on any port path
On 30 Jul 2015, at 10:47, Nirav Patel
npa...@xactlycorp.commailto:npa...@xactlycorp.com wrote:
15/07/29 11:19:26 INFO client.RMProxy: Connecting to
try looking at the causes and steps here
https://wiki.apache.org/hadoop/BindException
On 28 Jul 2015, at 09:22, Wayne Song
wayne.e.s...@gmail.commailto:wayne.e.s...@gmail.com wrote:
I made this message with the Nabble web interface; I included the stack trace
there, but I guess it didn't
On 10 Aug 2015, at 20:17, Akshat Aranya
aara...@gmail.commailto:aara...@gmail.com wrote:
Hi Jerry, Akhil,
Thanks your your help. With s3n, the entire file is downloaded even while just
creating the RDD with sqlContext.read.parquet(). It seems like even just
opening and closing the
Think it may be needed on Windows, certainly if you start trying to work with
local files.
On 4 Aug 2015, at 00:34, Sean Owen so...@cloudera.com wrote:
It won't affect you if you're not actually running Hadoop. But it's
mainly things like Snappy/LZO compression which are implemented as
Spark 1.3.1 1.4 only support Hive 0.13
Spark 1.5 is going to be released against Hive 1.2.1; it'll skip Hive .14
support entirely and go straight to the currently supported Hive release.
See SPARK-8064 for the gory details
On 3 Aug 2015, at 23:01, Ishwardeep Singh
There's no support for IAM roles in the s3n:// client code in Apache Hadoop (
HADOOP-9384 ); Amazon's modified EMR distro may have it..
The s3a filesystem adds it, —this is ready for production use in Hadoop 2.7.1+
(implicitly HDP 2.3; CDH 5.4 has cherrypicked the relevant patches.) I don't
On 23 Jul 2015, at 10:47, Greg Anderson gregory.ander...@familysearch.org
wrote:
So when I go to ~/ephemeral-hdfs/bin/hadoop and check its version, it says
Hadoop 2.0.0-cdh4.2.0. If I run pyspark and use the s3a address, things
should work, right? What am I missing? And thanks so
On 23 Jul 2015, at 01:50, Ewan Leith ewan.le...@realitymine.com wrote:
I think the standard S3 driver used in Spark from the Hadoop project (S3n)
doesn't support IAM role based authentication.
However, S3a should support it. If you're running Hadoop 2.6 via the
spark-ec2 scripts (I'm
I wouldn't try to play with forwarding tunnelling; always hard to work out
what ports get used everywhere, and the services like hostname==URL in paths.
Can't you just set up an entry in the windows /etc/hosts file? It's what I do
(on Unix) to talk to VMs
On 25 Aug 2015, at 04:49, Dino
just try dropping in that Jar. Hadoop core ships with an out of date guava JAR
to avoid breaking old code downstream, but 2.7.x is designed to work with later
versions too (i.e. it has moved off any of the now-removed methods. See
https://issues.apache.org/jira/browse/HADOOP-10101 for the
> On 22 Oct 2015, at 15:12, Ashish Shrowty wrote:
>
> I understand that there is some incompatibility with the API between Hadoop
> 2.6/2.7 and Amazon AWS SDK where they changed a signature of
>
On 22 Oct 2015, at 02:47, Ajay Chander
> wrote:
Thanks for your time. I have followed your inputs and downloaded
"spark-1.5.1-bin-hadoop2.6" on one of the node say node1. And when I did a pie
test everything seems to be working fine, except that
> On 26 Oct 2015, at 09:28, Jinfeng Li wrote:
>
> Replication factor is 3 and we have 18 data nodes. We check HDFS webUI, data
> is evenly distributed among 18 machines.
>
every block in HDFS (usually 64-128-256 MB) is distributed across three
machines, meaning 3
On 24 Oct 2015, at 00:46, Lin Zhao >
wrote:
I have a spark on YARN deployed using Cloudera Manager 5.4. The installation
went smoothly. But when I try to run spark-shell I get a long list of
exceptions saying "failed to bind to: /public_ip_of_host:0"
better wiki entry https://wiki.apache.org/hadoop/BindException
> On 28 Oct 2015, at 13:19, Bob Corsaro wrote:
>
> Has anyone successful built this? I'm trying to determine if there is a
> defect in the source package or something strange about my environment. I get
> a FileNotFound exception on MQTTUtils.class during the build of the
e APIs, it's not seamless to glue it up
with the spark context metric registry
On Mon, Oct 26, 2015 at 11:14 AM, Steve Loughran
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:
> On 26 Oct 2015, at 09:28, Jinfeng Li
> <liji...@gmail.com<mailto:liji...@gmail
try:
mvn test -pl sql -DwildcardSuites=org.apache.spark.sql -Dtest=none
On 12 Nov 2015, at 03:13, weoccc >
wrote:
Hi,
I am wondering how to run unit test for specific spark component only.
mvn test -DwildcardSuites="org.apache.spark.sql.*"
looks suspiciously like some thrift transport unmarshalling problem, THRIFT-2660
Spark 1.5 uses hive 1.2.1; it should have the relevant thrift JAR too.
Otherwise, you could play with thrift JAR versions yourself —maybe it will
work, maybe not...
On 13 Nov 2015, at 00:29, Yana Kadiyska
On 17 Nov 2015, at 15:39, Nikhil Gs
> wrote:
Hello Everyone,
Firstly, thank you so much for the response. In our cluster, we are using Spark
1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using Kerbros in
our cluster
> On 5 Nov 2015, at 00:12, Lan Jiang wrote:
>
> I have used protobuf 3 successfully with Spark on CDH 5.4, even though Hadoop
> itself comes with protobuf 2.5. I think the steps apply to HDP too. You need
> to do the following
Protobuf.jar has been so brittle in the past
On 5 Nov 2015, at 02:03, Younes Naguib
> wrote:
Hi all,
I’m reading large text files from s3. Sizes between from 30GB and 40GB.
Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason!
Here is my
On 30 Oct 2015, at 18:05, William Li
> wrote:
Thanks for your response. My secret has a back splash (/) so it didn’t work…
that's a recurrent problem with the hadoop/java s3 clients. Keep trying to
regenerate a secret until you get one that works
On 14 Oct 2015, at 20:56, Marco Mistroni
> wrote:
15/10/14 20:52:35 WARN : Your hostname, MarcoLaptop resolves to a loopback/non-r
eachable address: fe80:0:0:0:c5ed:a66d:9d95:5caa%wlan2, but we couldn't find any
external IP address!
> On 15 Oct 2015, at 19:04, Scott Reynolds wrote:
>
> List,
>
> Right now we build our spark jobs with the s3a hadoop client. We do this
> because our machines are only allowed to use IAM access to the s3 store. We
> can build our jars with the s3a filesystem and the
you've hit this
https://wiki.apache.org/hadoop/WindowsProblems
the next version of hadoop will fail with a more useful message, including that
wiki link
On 21 Oct 2015, at 00:36, Renato Perini
> wrote:
java.lang.RuntimeException:
On 7 Oct 2015, at 06:28, Krzysztof Zarzycki
> wrote:
Hi Vikram, So you give up using yarn-cluster mode of launching Spark jobs, is
that right? AFAIK when using yarn-cluster mode, the launch process
(spark-submit) monitors job running on YARN,
> On 7 Oct 2015, at 09:26, Dominik Fries wrote:
>
> Hello Folks,
>
> We want to deploy several spark projects and want to use a unique project
> user for each of them. Only the project user should start the spark
> application and have the corresponding packages
On 6 Oct 2015, at 01:23, Andrew Or
> wrote:
Both the history server and the shuffle service are backward compatible, but
not forward compatible. This means as long as you have the latest version of
history server / shuffle service running in
On 7 Oct 2015, at 11:06, ayan guha
> wrote:
Can queues also be used to separate workloads?
yes; that's standard practise. Different YARN queues can have different maximum
memory & CPU, and you can even tag queues as "pre-emptible", so more
ifferent SQL engine underneath
Thanks,
Jagat Singh
On Wed, Sep 30, 2015 at 9:37 PM, Vinay Shukla
<vinayshu...@gmail.com<mailto:vinayshu...@gmail.com>> wrote:
Steve is right,
The Spark thing server does not profs page end user identity downstream yet.
On Wednesday, September
During development, I'd recommend giving Hadoop a version ending with
-SNAPSHOT, and building spark with maven, as mvn knows to refresh the snapshot
every day.
you can do this in hadoop with
mvn versions:set 2.7.0.stevel-SNAPSHOT
if you are working on hadoop branch-2 or trunk direct, they
On 12 Oct 2015, at 23:11, Marco Mistroni
> wrote:
HI all
i have downloaded spark-1.5.1-bin-hadoop.2.4
i have extracted it on my machine, but when i go to the \bin directory and
invoke
spark-shell i get the following exception
Could anyone
On 11 Jul 2015, at 19:20, Aaron Davidson
ilike...@gmail.commailto:ilike...@gmail.com wrote:
Note that if you use multi-part upload, each part becomes 1 block, which allows
for multiple concurrent readers. One would typically use fixed-size block sizes
which align with Spark's default HDFS
One of our clusters runs on AWS with a portion of the nodes being spot nodes.
We would like to force the application master not to run on spot nodes. For
what ever reason, application master is not able to recover in cases the node
where it was running suddenly disappears, which is the case
On 17 Nov 2015, at 02:00, Nikhil Gs
> wrote:
Hello Team,
Below is the error which we are facing in our cluster after 14 hours of
starting the spark submit job. Not able to understand the issue and why its
facing the below error
On 17 Nov 2015, at 09:54, Kayode Odeyemi
> wrote:
Initially, I submitted 2 jobs to the YARN cluster which was running for 2 days
and suddenly stops. Nothing in the logs shows the root cause.
48 hours is one of those kerberos warning times (as is
> On 31 Aug 2015, at 11:02, Daniel Schulz wrote:
>
> Hi guys,
>
> In a nutshell: does Spark check and respect user privileges when
> reading/writing data.
Yes, in a locked down YARN cluster —until your tokens expire
>
> I am curious about the data security
On 31 Aug 2015, at 19:49, Sigurd Knippenberg
> wrote:
I know I can adjust the max open files allowed by the OS but I'd rather fix the
underlaying issue.
bumping up the OS handle limits is step #1 of installing a hadoop cluster
If its running the thrift server from hive, it's got a SQL API for you to
connect to...
On 3 Sep 2015, at 17:03, Dhaval Patel
> wrote:
I am accessing a shared cluster mode Spark environment. However, there is an
existing application
he Spark job doesn't
release its file handles until the end of the job instead of doing that while
my loop iterates.
Sigurd
On Wed, Sep 2, 2015 at 4:33 AM, Steve Loughran
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:
On 31 Aug 2015, at 19:49, Sigurd Knippenberg
s3a:// has a proxy option
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
s3n: apparently gets set up differently, though I've never tested it
http://stackoverflow.com/questions/20241953/hadoop-distcp-to-s3-behind-http-proxy
> On 8 Sep 2015, at 13:51, tariq
> On 12 Sep 2015, at 09:14, Sean Owen wrote:
>
> This is a question for the CDH list. CDH 5.4 has Spark 1.3, and 5.5
> has 1.5. The best thing is to update CDH as a whole if you can.
>
> However it's pretty simple to just run a newer Spark assembly as a
> YARN app. Don't
On 15 Sep 2015, at 05:47, Lan Jiang
> wrote:
Hi, there,
I am using Spark 1.4.1. The protobuf 2.5 is included by Spark 1.4.1 by default.
However, I would like to use Protobuf 3 in my spark application so that I can
use some new features such as Map
> On 15 Sep 2015, at 08:55, Adrian Bridgett wrote:
>
> Hi Sam, in short, no, it's a traditional install as we plan to use spot
> instances and didn't want price spikes to kill off HDFS.
>
> We're actually doing a bit of a hybrid, using spot instances for the mesos
>
On 30 Sep 2015, at 03:24, Mohammed Guller
> wrote:
Does each user needs to start own thrift server to use it?
No. One of the benefits of the Spark Thrift Server is that it allows multiple
users to share a single SparkContext.
Most likely,
On 1 Oct 2015, at 16:52, Adrian Tanase
> wrote:
This happens automatically as long as you submit with cluster mode instead of
client mode. (e.g. ./spark-submit —master yarn-cluster …)
The property you mention would help right after that, although
On 23 Sep 2015, at 14:56, Michal Čizmazia
> wrote:
To get around the fact that flush does not work in S3, my custom WAL
implementation stores a separate S3 object per each WriteAheadLog.write call.
Do you see any gotchas with this approach?
On 23 Sep 2015, at 07:10, Tathagata Das
> wrote:
Responses inline.
On Tue, Sep 22, 2015 at 8:35 PM, Michal Čizmazia
> wrote:
Can checkpoints be stored to S3 (via S3/S3A Hadoop URL)?
Yes. Because
On 22 Sep 2015, at 10:40, Akhil Das
> wrote:
or you can set it in the environment as:
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
didn't think the Hadoop code looked at those. There aren't any references to
the env
> On 17 Sep 2015, at 21:40, Tathagata Das wrote:
>
> Actually, the current WAL implementation (as of Spark 1.5) does not work with
> S3 because S3 does not support flushing. Basically, the current
> implementation assumes that after write + flush, the data is immediately
On 25 Sep 2015, at 03:35, Zhang, Jingyu
> wrote:
I got following exception when I run
JavPairRDD.values().saveAsTextFile("s3n://bucket); Can anyone help me out?
thanks
15/09/25 12:24:32 INFO SparkContext: Successfully stopped
1 - 100 of 295 matches
Mail list logo