Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
I actually recently used 'Incomplete' a bit when the JIRA is basically too poorly formed (like just copying and pasting an error) ... I was thinking about 'Unresolved' status or `Auto Closed' too. I double checked they can be reopen as well after resolution. [image: Screen Shot 2019-05-16 at

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
oh, wait. 'Incomplete' can still make sense in this way then. Yes, I am good with 'Incomplete' too. 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: > I actually recently used 'Incomplete' a bit when the JIRA is basically > too poorly formed (like just copying and pasting an error) ... > > I was

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
Agree, anything without an Affected Version should be old enough to time out. I might use "Incomplete" or something as the status, as we haven't otherwise used that. Maybe that's simpler than a label. But, anything like that sounds good. On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon wrote: >

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
BTW, affected version became a required field (I don't remember when exactly was .. I believe it's around when we work on Spark 2.3): [image: Screen Shot 2019-05-16 at 10.29.50 AM.png] So, including all EOL versions and affected versions not specified will roughly work. Using "Cannot Reproduce"

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Bobby Evans
It would allow for the columnar processing to be extended through the shuffle. So if I were doing say an FPGA accelerated extension it could replace the ShuffleExechangeExec with one that can take a ColumnarBatch as input instead of a Row. The extended version of the ShuffleExchangeExec could

Re: Suggestion on Join Approach with Spark

2019-05-15 Thread Chetan Khatri
Hello Nicholas, I sincerely apologise. Thanks On Wed, May 15, 2019 at 11:34 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > This kind of question is for the User list, or for something like Stack > Overflow. It's not on topic here. > > The dev list (i.e. this list) is for

Re: Suggestion on Join Approach with Spark

2019-05-15 Thread Nicholas Chammas
This kind of question is for the User list, or for something like Stack Overflow. It's not on topic here. The dev list (i.e. this list) is for discussions about the development of Spark itself. On Wed, May 15, 2019 at 1:50 PM Chetan Khatri wrote: > Any one help me, I am confused. :( > > On

Re: Suggestion on Join Approach with Spark

2019-05-15 Thread Chetan Khatri
Any one help me, I am confused. :( On Wed, May 15, 2019 at 7:28 PM Chetan Khatri wrote: > Hello Spark Developers, > > I have a question on Spark Join I am doing. > > I have a full load data from RDBMS and storing at HDFS let's say, > > val historyDF =

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Imran Rashid
sorry I am late to the discussion here -- the jira mentions using this extensions for dealing with shuffles, can you explain that part? I don't see how you would use this to change shuffle behavior at all. On Tue, May 14, 2019 at 10:59 AM Thomas graves wrote: > Thanks for replying, I'll extend

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Josh Rosen
+1 in favor of some sort of JIRA cleanup. My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed,

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
I gave up looking through JIRAs a long time ago, so, big respect for continuing to try to triage them. I am afraid we're missing a few important bug reports in the torrent, but most JIRAs are not well-formed, just questions, stale, or simply things that won't be added. I do think it's important to

Suggestion on Join Approach with Spark

2019-05-15 Thread Chetan Khatri
Hello Spark Developers, I have a question on Spark Join I am doing. I have a full load data from RDBMS and storing at HDFS let's say, val historyDF = spark.read.parquet(*"/home/test/transaction-line-item"*) and I am getting changed data at seperate hdfs path,let's say; val deltaDF =

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
Yea, more sophisticated condition is welcome. My only goal is to make it to a manageable size. I would go for the option that reduces more tickets - under 1000 OPEN (and REOPEN) tickets so that we can at least go through in one go without coming up with a non duplicating filter to go through. On

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Abdeali Kothari
Was thinking that getting an estimated statistic of the number of issues that would be closed if this is done would help. Open issues: 3882 (project = SPARK AND status in (Open, "In Progress", Reopened)) Open + Does not affect 3.0+ = 2795 Open + Does not affect 2.4+ = 2373 Open + Does not affect

Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
Hi all, I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version not specified. I was rather against this way and considered this as last resort in roughly 3 years ago when we discussed. Now I think we should go ahead with this. See below. I