[jira] [Commented] (SPARK-27339) Decimal up cast to higher scale fails while reading parquet to Dataset

2022-09-27 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609958#comment-17609958 ] sam commented on SPARK-27339: - [~hyukjin.kwon][~wrschneider99] [~ksbalas]. We are working on a work around

[jira] [Updated] (SPARK-40048) Cached partitions are traversed multiple times (invalidating Accumulator consistency)

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Summary: Cached partitions are traversed multiple times (invalidating Accumulator consistency) (was: Partitions

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Affects Version/s: 3.2.1 > Partitions are traversed multiple times invalidating Accumulator consistency >

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579231#comment-17579231 ] sam commented on SPARK-40048: - [~hyukjin.kwon] Unfortunately bumping to 3.2.1 did not fix the issue: ```

[jira] [Comment Edited] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579229#comment-17579229 ] sam edited comment on SPARK-40048 at 8/13/22 9:30 AM: -- We tried `3.2.1` and I'm now

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579229#comment-17579229 ] sam commented on SPARK-40048: - We tried `3.2.1` and I'm now looking at

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579226#comment-17579226 ] sam commented on SPARK-40048: - Thanks [~hyukjin.kwon], but we hit a number of issues trying to bump Spark

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578447#comment-17578447 ] sam commented on SPARK-40048: - I've found a very dodgy hack around with this: ``` def

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578369#comment-17578369 ] sam commented on SPARK-40048: - Also confirmed no eviction seems to be happening with

[jira] [Updated] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-40048: Description: We are trying to use Accumulators to count RDDs without having to force `.count()` on them for

[jira] [Created] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-11 Thread sam (Jira)
sam created SPARK-40048: --- Summary: Partitions are traversed multiple times invalidating Accumulator consistency Key: SPARK-40048 URL: https://issues.apache.org/jira/browse/SPARK-40048 Project: Spark

[jira] [Commented] (SPARK-10000) Consolidate storage and execution memory management

2022-07-28 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572486#comment-17572486 ] sam commented on SPARK-1: - Is there any way to disable Spark from evicting rdd partitions from memory

[jira] [Commented] (SPARK-14289) Support multiple eviction strategies for cached RDD partitions

2022-07-28 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572485#comment-17572485 ] sam commented on SPARK-14289: - Is there any way to disable Spark from evicting rdd partitions from memory

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Affects Version/s: 1.6.0 > Spark evicts RDD partitions instead of allowing OOM >

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark (pre 1.6) jobs would give OOM if an RDD could not fit into memory (when trying to

[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-16 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark jobs would give OOM if an RDD could not fit into memory (when trying to cache

[jira] [Created] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM

2021-10-09 Thread sam (Jira)
sam created SPARK-36966: --- Summary: Spark evicts RDD partitions instead of allowing OOM Key: SPARK-36966 URL: https://issues.apache.org/jira/browse/SPARK-36966 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Description: We have a job that caches RDDs into memory. We know the code to cache is working as the spark logs

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Description: We have a job that caches RDDs into memory. We know the code to cache is working as the spark logs

[jira] [Created] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
sam created SPARK-34733: --- Summary: Spark UI not showing memory used of partitions in memory Key: SPARK-34733 URL: https://issues.apache.org/jira/browse/SPARK-34733 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-34733) Spark UI not showing memory used of partitions in memory

2021-03-13 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-34733: Attachment: Screenshot 2021-03-13 at 16.31.06.png > Spark UI not showing memory used of partitions in memory >

[jira] [Commented] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-05 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988676#comment-16988676 ] sam commented on SPARK-30101: - [~kabhwan] [~cloud_fan] [~sowen] > We may deal with it we strongly agree

[jira] [Updated] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-30101: Description: I'm creating a `SparkSession` like this: ``` SparkSession

[jira] [Updated] (SPARK-30101) spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-30101: Summary: spark.sql.shuffle.partitions is not in Configuration docs, but a very critical parameter (was: Dataset

[jira] [Reopened] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-30101: - What is expected, is what is documented. > Dataset distinct does not respect spark.default.parallelism >

[jira] [Commented] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-03 Thread sam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986722#comment-16986722 ] sam commented on SPARK-30101: - [~cloud_fan] [~kabhwan] Well this is at least a documentation error since

[jira] [Created] (SPARK-30101) Dataset distinct does not respect spark.default.parallelism

2019-12-02 Thread sam (Jira)
sam created SPARK-30101: --- Summary: Dataset distinct does not respect spark.default.parallelism Key: SPARK-30101 URL: https://issues.apache.org/jira/browse/SPARK-30101 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option

2019-01-29 Thread sam (JIRA)
sam created SPARK-26770: --- Summary: Misleading/unhelpful error message when wrapping a null in an Option Key: SPARK-26770 URL: https://issues.apache.org/jira/browse/SPARK-26770 Project: Spark Issue

[jira] [Updated] (SPARK-26534) Closure Cleaner Bug

2019-01-07 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-26534: Description: I've found a strange combination of closures where the closure cleaner doesn't seem to be smart

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-07 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735699#comment-16735699 ] sam commented on SPARK-26534: - [~viirya] If I change to RDD I cannot reproduce either.  This is further

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-06 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735211#comment-16735211 ] sam commented on SPARK-26534: - [~viirya] Your version is slightly different, can you reproduce using exactly

[jira] [Created] (SPARK-26534) Closure Cleaner Bug

2019-01-04 Thread sam (JIRA)
sam created SPARK-26534: --- Summary: Closure Cleaner Bug Key: SPARK-26534 URL: https://issues.apache.org/jira/browse/SPARK-26534 Project: Spark Issue Type: Bug Components: Spark Core

[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM

2018-11-12 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683600#comment-16683600 ] sam edited comment on SPARK-2243 at 11/12/18 11:24 AM: --- Big bonus of being able to

[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM

2018-11-12 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683600#comment-16683600 ] sam commented on SPARK-2243: Big bonus of being able to create and shutdown SparkContexts is to be able to

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-05-30 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494852#comment-16494852 ] sam commented on SPARK-20144: - Regarding the original issue of sorting, I agree with [~srowen] in that it

[jira] [Created] (SPARK-24425) Regression from 1.6 to 2.x - Spark no longer respects input partitions, unnecessary shuffle required

2018-05-30 Thread sam (JIRA)
sam created SPARK-24425: --- Summary: Regression from 1.6 to 2.x - Spark no longer respects input partitions, unnecessary shuffle required Key: SPARK-24425 URL: https://issues.apache.org/jira/browse/SPARK-24425

[jira] [Commented] (SPARK-6190) create LargeByteBuffer abstraction for eliminating 2GB limit on blocks

2018-03-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407729#comment-16407729 ] sam commented on SPARK-6190: [~bdolbeare] [~UZiVcbfPXaNrMtT] I completely agree that it's depressing many of

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-10 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320041#comment-16320041 ] sam commented on SPARK-17998: - [~srowen] Thanks, no idea where I got that from, cursed weakly typed silently

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-10-13 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203882#comment-16203882 ] sam commented on SPARK-20144: - I think this is a regression. We used to be able to easily control the number

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2017-10-13 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203877#comment-16203877 ] sam commented on SPARK-17998: - [~lwlin] I think this is a regression. We used to be able to easily control

[jira] [Commented] (SPARK-22225) wholeTextFilesIterators

2017-10-10 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198326#comment-16198326 ] sam commented on SPARK-5: - Thanks [~srowen] and [~hyukjin.kwon], I wasn't aware of either of these

[jira] [Commented] (SPARK-18965) wholeTextFiles() is not able to read large files

2017-10-09 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196872#comment-16196872 ] sam commented on SPARK-18965: - [~pradeep_misra] [~srowen]. Yes it's a new feature. What we need is this:

[jira] [Created] (SPARK-22225) wholeTextFilesIterators

2017-10-09 Thread sam (JIRA)
sam created SPARK-5: --- Summary: wholeTextFilesIterators Key: SPARK-5 URL: https://issues.apache.org/jira/browse/SPARK-5 Project: Spark Issue Type: New Feature Components: Spark

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057592#comment-16057592 ] sam commented on SPARK-21137: - [~srowen] I thought I already made a point about that? Please can you tell me

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054176#comment-16054176 ] sam edited comment on SPARK-21137 at 6/19/17 3:20 PM: -- [~srowen] Ah OK, sorry, not

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054176#comment-16054176 ] sam commented on SPARK-21137: - [~srowen] Ah OK, sorry, not used to that process. On other projects I've seen

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:36 PM: -- [~srowen] > what stages are

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:35 PM: -- [~srowen] > what stages are

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam commented on SPARK-21137: - [~srowen] > what stages are executing if any? *None, no tasks are started*.

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054026#comment-16054026 ] sam edited comment on SPARK-21137 at 6/19/17 1:53 PM: -- [~srowen] As I said in the

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054026#comment-16054026 ] sam commented on SPARK-21137: - [~srowen] As I said in the description, which you may have missed, the logs

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053977#comment-16053977 ] sam commented on SPARK-21137: - [~srowen] So I've provided full reproduce steps here (including code and

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053808#comment-16053808 ] sam edited comment on SPARK-21137 at 6/19/17 11:14 AM: --- [~srowen] Sorry about the

[jira] [Reopened] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-21137: - Reopened after adding detail. > Spark cannot read many small files (wholeTextFiles) >

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053808#comment-16053808 ] sam commented on SPARK-21137: - [~srowen] Sorry about the lack of detail Sean. I guess I just assumed this

[jira] [Created] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
sam created SPARK-21137: --- Summary: Spark cannot read many small files (wholeTextFiles) Key: SPARK-21137 URL: https://issues.apache.org/jira/browse/SPARK-21137 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2017-02-01 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848259#comment-15848259 ] Sam commented on SPARK-5159: We are still having exactly this issue, any advice would be greatly appreciated

[jira] [Commented] (SPARK-11075) Spark SQL Thrift Server authentication issue on kerberized yarn cluster

2017-02-01 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848257#comment-15848257 ] Sam commented on SPARK-11075: - We are still having exactly this issue, any advice would be greatly

[jira] [Commented] (SPARK-16666) Kryo encoder for custom complex classes

2016-08-05 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410454#comment-15410454 ] Sam commented on SPARK-1: - [~clockfly] in your code sample, there is a case class for Point, not esri's

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point`

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point`

[jira] [Updated] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam updated SPARK-1: Description: I'm trying to create a dataset with some geo data using spark and esri. If `Foo` only have `Point`

[jira] [Created] (SPARK-16666) Kryo encoder for custom complex classes

2016-07-21 Thread Sam (JIRA)
Sam created SPARK-1: --- Summary: Kryo encoder for custom complex classes Key: SPARK-1 URL: https://issues.apache.org/jira/browse/SPARK-1 Project: Spark Issue Type: Question

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-27 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1503#comment-1503 ] sam commented on SPARK-11853: - Fair enough [~srowen] I'll concede that dependency management is inherently

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-23 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Affects Version/s: (was: 1.5.1) 1.5.0 > java.lang.ClassNotFoundException with

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020545#comment-15020545 ] sam commented on SPARK-11853: - OK, I'll look into that next week and see if I can put together a minimal

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-21 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020381#comment-15020381 ] sam commented on SPARK-11853: - spark-submit --master yarn-client --class my.class.Main my.jar --my --jar

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018032#comment-15018032 ] sam commented on SPARK-11853: - [~srowen] We are not using --jars or anything like that, just executor cores,

[jira] [Comment Edited] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018032#comment-15018032 ] sam edited comment on SPARK-11853 at 11/20/15 1:40 PM: --- [~srowen] We are not using

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014149#comment-15014149 ] sam commented on SPARK-11853: - [~srowen] Take another look, I edited the description shortly after creating

[jira] [Commented] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014144#comment-15014144 ] sam commented on SPARK-11854: - [~srowen] It's on emr-4.1.0 with latest Spark EMR uses (so 1.5.1 I guess). My

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014667#comment-15014667 ] sam commented on SPARK-11853: - Just ran a simple 1 line app with `sc.makeRDD` locally trying to reproduce,

[jira] [Closed] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam closed SPARK-11854. --- Tried to produce a minimal app to reproduce, couldn't, probably issue lies between keyboard and chair. I assumed it

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Summary: java.lang.ClassNotFoundException with spray-json on EMR (was: java.lang.ClassNotFoundException for no

[jira] [Reopened] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-11853: - > java.lang.ClassNotFoundException with spray-json on EMR > ---

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO

[jira] [Commented] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014635#comment-15014635 ] sam commented on SPARK-11853: - [~srowen] // you're not sure what version you're running here // Confirmed

[jira] [Comment Edited] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014144#comment-15014144 ] sam edited comment on SPARK-11854 at 11/19/15 10:56 PM: [~srowen] It's on

[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014683#comment-15014683 ] sam commented on SPARK-3877: Actually ignore, as per comment in duplicate, can't seem to reproduce. > The

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException with spray-json on EMR

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO

[jira] [Updated] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11853: Description: I'm using a fat jar (sbt assembly), so there is no reason for spark to do this. MORE INFO

[jira] [Created] (SPARK-11853) java.lang.ClassNotFoundException for no reason

2015-11-19 Thread sam (JIRA)
sam created SPARK-11853: --- Summary: java.lang.ClassNotFoundException for no reason Key: SPARK-11853 URL: https://issues.apache.org/jira/browse/SPARK-11853 Project: Spark Issue Type: Bug Affects

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Affects Version/s: (was: 1.1.0) 1.5.1 > The exit code of spark-submit is still 0 when

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Fix Version/s: (was: 1.2.0) (was: 1.1.1) > The exit code of spark-submit is still 0

[jira] [Created] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
sam created SPARK-11854: --- Summary: The exit code of spark-submit is still 0 when an yarn application fails Key: SPARK-11854 URL: https://issues.apache.org/jira/browse/SPARK-11854 Project: Spark Issue

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Target Version/s: (was: 1.1.1, 1.2.0) > The exit code of spark-submit is still 0 when an yarn application fails

[jira] [Updated] (SPARK-11854) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-11854: Description: When an yarn application fails (-yarn-cluster- yarn-client mode), the exit code of spark-submit is

[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-18 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011690#comment-15011690 ] sam commented on SPARK-3877: Is this really fixed?? I'm getting this on 1.5.0 using EMR. [~tgraves]

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2015-07-30 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647446#comment-14647446 ] sam commented on SPARK-4492: I imagine building a fat jar for running with `java -cp` is

  1   2   >