Would something like this help?
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionPruningRDD.scala
On Thu, Jul 24, 2014 at 8:40 AM, Eugene Cheipesh echeip...@gmail.com
wrote:
Hello,
I have an interesting use case for a pre-filtered RDD. I have
- spun up an EC2 cluster successfully using spark-ec2
- tested S3 file access from that cluster successfully
+1
On Tue, Jul 29, 2014 at 1:46 AM, Henry Saputra henry.sapu...@gmail.com
wrote:
NOTICE and LICENSE files look good
Hashes and sigs look good
No executable in the source
PartitionPruningRDD.scala still only handles, as said, the partition portion of
the issue.
On the record pruning portion, although cheap fixes could be available for
this issue as reported, but I believe a
fundamental issue is lack of a mechanism of processing merging/pushdown. Given
the
I am not sure if I agree that it lacks the mechanism to do pushdowns.
Hadoop InputFormat itself provides some basic mechanism to push down
predicates already. The HBase InputFormat already implements it. In Spark,
you can also run arbitrary user code, and you can decide what to do. You
can also
Hi Reynold,
I agree that we should not hurry right now to modify/enhance APIs and could be
satisfied with extending existing ones as much as possible. On the other hand,
more intelligent data stores like HBase or Cassendra do support
complex pushdowns, often more complex than their MR
Of late, I've been coming across quite a few pull requests and associated
JIRA issues that contain nothing indicating their purpose beyond a pretty
minimal description of what the pull request does. On the pull request
itself, a reference to the corresponding JIRA in the title combined with a
+1 on this.
On Tue, Jul 29, 2014 at 4:34 PM, Mark Hamstra m...@clearstorydata.com
wrote:
Of late, I've been coming across quite a few pull requests and associated
JIRA issues that contain nothing indicating their purpose beyond a pretty
minimal description of what the pull request does. On
- Original Message -
Sure, drop() would be useful, but breaking the transformations are lazy;
only actions launch jobs model is abhorrent -- which is not to say that we
haven't already broken that model for useful operations (cf.
RangePartitioner, which is used for sorted RDDs), but
+1 on using JIRA workflows to manage the backlog, and +9000 on having
decent descriptions for all JIRA issues.
On Tue, Jul 29, 2014 at 7:48 PM, Sean Owen so...@cloudera.com wrote:
How about using a JIRA status like Documentation Required to mean
burden's on you to elaborate with a motivation
I agree as well. FWIW sometimes I've seen this happen due to language barriers,
i.e. contributors whose primary language is not English, but we need more
motivation for each change.
On July 29, 2014 at 5:12:01 PM, Nicholas Chammas (nicholas.cham...@gmail.com)
wrote:
+1 on using JIRA workflows
10 matches
Mail list logo