On 23 Sep 2015, at 14:56, Michal Čizmazia
<mici...@gmail.com<mailto:mici...@gmail.com>> wrote:
To get around the fact that flush does not work in S3, my custom WAL
implementation stores a separate S3 object per each WriteAheadLog.write call.
Do you see any gotchas with t
new-usability-enhancements/>
On 23 September 2015 at 13:12, Steve Loughran <ste...@hortonworks.com>
wrote:
>
> On 23 Sep 2015, at 14:56, Michal Čizmazia <mici...@gmail.com> wrote:
>
> To get around the fact that flush does not work in S3, my custom WAL
> implementatio
Responses inline.
On Tue, Sep 22, 2015 at 8:35 PM, Michal Čizmazia wrote:
> Can checkpoints be stored to S3 (via S3/S3A Hadoop URL)?
>
> Yes. Because checkpoints are single files by itself, and does not require
flush semantics to work. So S3 is fine.
> Trying to answer
On 23 Sep 2015, at 07:10, Tathagata Das
> wrote:
Responses inline.
On Tue, Sep 22, 2015 at 8:35 PM, Michal Čizmazia
> wrote:
Can checkpoints be stored to S3 (via S3/S3A Hadoop URL)?
Yes. Because
You can keep the checkpoints in the Hadoop-compatible file system and the
WAL somewhere else using your custom WAL implementation. Yes, cleaning up
the stuff gets complicated as it is not as easy as deleting off the
checkpoint directory - you will have to clean up checkpoint directory as
well as
Can checkpoints be stored to S3 (via S3/S3A Hadoop URL)?
Trying to answer this question, I looked into Checkpoint.getCheckpointFiles
[1]. It is doing findFirstIn which would probably be calling the S3 LIST
operation. S3 LIST is prone to eventual consistency [2]. What would happen
when
I am trying to use pluggable WAL, but it can be used only with
checkpointing turned on. Thus I still need have a Hadoop-compatible file
system.
Is there something like pluggable checkpointing?
Or can WAL be used without checkpointing? What happens when WAL is
available but the checkpoint
My understanding of pluggable WAL was that it eliminates the need for
having a Hadoop-compatible file system [1].
What is the use of pluggable WAL when it can be only used together with
checkpointing which still requires a Hadoop-compatible file system?
[1]:
1. Currently, the WAL can be used only with checkpointing turned on,
because it does not make sense to recover from WAL if there is not
checkpoint information to recover from.
2. Since the current implementation saves the WAL in the checkpoint
directory, they share the fate -- if checkpoint
I dont think it would work with multipart upload either. The file is not
visible until the multipart download is explicitly closed. So even if each
write a part upload, all the parts are not visible until the multiple
download is closed.
TD
On Fri, Sep 18, 2015 at 1:55 AM, Steve Loughran
> On 17 Sep 2015, at 21:40, Tathagata Das wrote:
>
> Actually, the current WAL implementation (as of Spark 1.5) does not work with
> S3 because S3 does not support flushing. Basically, the current
> implementation assumes that after write + flush, the data is immediately
I assume you don't use Kinesis.
Are you running Spark 1.5.0 ?
If you must use S3, is switching to Kinesis possible ?
Cheers
On Thu, Sep 17, 2015 at 1:09 PM, Michal Čizmazia wrote:
> How to make Write Ahead Logs to work with S3? Any pointers welcome!
>
> It seems as a known
Actually, the current WAL implementation (as of Spark 1.5) does not work
with S3 because S3 does not support flushing. Basically, the current
implementation assumes that after write + flush, the data is immediately
durable, and readable if the system crashes without closing the WAL file.
This does
How to make Write Ahead Logs to work with S3? Any pointers welcome!
It seems as a known issue: https://issues.apache.org/jira/browse/SPARK-9215
I am getting this exception when reading write ahead log:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure:
Please could you explain how to use pluggable WAL?
After I implement the WriteAheadLog abstract class, how can I use it? I
want to use it with a Custom Reliable Receiver. I am using Spark 1.4.1.
Thanks!
On 17 September 2015 at 16:40, Tathagata Das wrote:
> Actually, the
You could override the spark conf called
"spark.streaming.receiver.writeAheadLog.class" with the class name.
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogUtils.scala#L30
On Thu, Sep 17, 2015 at 2:04 PM, Michal Čizmazia
16 matches
Mail list logo