I opened a PR - https://github.com/apache/spark-website/pull/232
2019년 11월 19일 (화) 오전 9:22, Hyukjin Kwon <[email protected]>님이 작성: > Let me document as below in few days: > > 1. For Python and Java, write a single comment that starts with JIRA ID > and short description, e.g. (SPARK-XXXXX: test blah blah) > 2. For R, use JIRA ID as a prefix for its test name. > > assuming everybody is happy. > > 2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon <[email protected]>님이 작성: > >> Actually there are not so many Java test cases in Spark (because Scala >> runs on JVM as everybody knows)[1]. >> >> Given that, I think we can avoid to put some efforts on this for now .. I >> don't mind if somebody wants to give a shot since it looks good anyway but >> to me I wouldn't spend so much time on this .. >> >> Let me just go ahead as I suggested if you don't mind. Anyone can give a >> shot for Display Name - I'm willing to actively review and help. >> >> [1] >> git ls-files '*Suite.java' | wc -l >> 172 >> git ls-files '*Suite.scala' | wc -l >> 1161 >> >> 2019년 11월 18일 (월) 오전 3:27, Steve Loughran <[email protected]>님이 작성: >> >>> Test reporters do often contain some assumptions about the characters in >>> the test methods. Historically JUnit XML reporters have never sanitised the >>> method names so XML injection attacks have been fairly trivial. Haven't >>> tried this for a while. >>> >>> That whole JUnit XML report "standard" was actually put together in the >>> Ant project with <Junitreport> doing the postprocessing of the JUnit run. >>> It was driven by the team's XSL skills than any overreaching strategic goal >>> about how to present test results of tests which could run for hours and >>> whose output you would really want to aggregate the locks from multiple >>> machines and processes and present in awake you can actually navigate. With >>> hindsight, a key failing is that we chose to store the test summaries (test >>> count, failure count...) as attributes on the root XML mode. Which is why >>> the whole DOM gets built up in the JUnit runner. Which is why when that >>> JUnit process crashes, you get no report at all. >>> >>> It'd be straightforward to fix -except too much relies on that file >>> now...important things will break. And the maven runner has historically >>> never supported custom reporters, to let you experiment with it. >>> >>> Maybe this is an opportunity to change things. >>> >>> On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <[email protected]> >>> wrote: >>> >>>> DisplayName looks good in general but actually here I would like first >>>> to find a existing pattern to document in guidelines given the actual >>>> existing practice we all are used to. I'm trying to be very conservative >>>> since this guidelines affect everybody. >>>> >>>> I think it might be better to discuss separately if we want to change >>>> what we have been used to. >>>> >>>> Also, using arbitrary names might not be actually free due to such bug >>>> like https://github.com/apache/spark/pull/25630 . It will need some >>>> more efforts to investigate as well. >>>> >>>> On Fri, 15 Nov 2019, 20:56 Steve Loughran, <[email protected]> >>>> wrote: >>>> >>>>> Junit5: Display names. >>>>> >>>>> Goes all the way to the XML. >>>>> >>>>> >>>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names >>>>> >>>>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu < >>>>> [email protected]> wrote: >>>>> >>>>>> Should we also add a guideline for non Scala tests? Other languages >>>>>> (Java, Python, R) don't support using string as a test name. >>>>>> >>>>>> Best Regards, >>>>>> Ryan >>>>>> >>>>>> >>>>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I opened a PR - https://github.com/apache/spark-website/pull/231 >>>>>>> >>>>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <[email protected]>님이 작성: >>>>>>> >>>>>>>> > In general a test should be self descriptive and I don't think we >>>>>>>> should be adding JIRA ticket references wholesale. Any action that the >>>>>>>> reader has to take to understand why a test was introduced is one too >>>>>>>> many. >>>>>>>> However in some cases the thing we are trying to test is very subtle >>>>>>>> and in >>>>>>>> that case a reference to a JIRA ticket might be useful, I do still feel >>>>>>>> that this should be a backstop and that properly documenting your >>>>>>>> tests is >>>>>>>> a much better way of dealing with this. >>>>>>>> >>>>>>>> Yeah, the test should be self-descriptive. I don't think adding a >>>>>>>> JIRA prefix harms this point. Probably I should add this sentence in >>>>>>>> the >>>>>>>> guidelines as well. >>>>>>>> Adding a JIRA prefix just adds one extra hint to track down >>>>>>>> details. I think it's fine to stick to this practice and make it >>>>>>>> simpler >>>>>>>> and clear to follow. >>>>>>>> >>>>>>>> > 1. what if multiple JIRA IDs relating to the same test? we just >>>>>>>> take the very first JIRA ID? >>>>>>>> Ideally one JIRA should describe one issue and one PR should fix >>>>>>>> one JIRA with a dedicated test. >>>>>>>> Yeah, I think I would take the very first JIRA ID. >>>>>>>> >>>>>>>> > 2. are we going to have a full scan of all existing tests and >>>>>>>> attach a JIRA ID to it? >>>>>>>> Yea, let's don't do this. >>>>>>>> >>>>>>>> > It's a nice-to-have, not super essential, just because ... >>>>>>>> It's been asked multiple times and each committer seems having a >>>>>>>> different understanding on this. >>>>>>>> It's not a biggie but wanted to make it clear and conclude this. >>>>>>>> >>>>>>>> > I'd add this only when a test specifically targets a certain >>>>>>>> issue. >>>>>>>> Yes, so this one I am not sure. From what I heard, people adds the >>>>>>>> JIRA in cases below: >>>>>>>> >>>>>>>> - Whenever the JIRA type is a bug >>>>>>>> - When a PR adds a couple of tests >>>>>>>> - Only when a test specifically targets a certain issue. >>>>>>>> - ... >>>>>>>> >>>>>>>> Which one do we prefer and simpler to follow? >>>>>>>> >>>>>>>> Or I can combine as below (im gonna reword when I actually document >>>>>>>> this): >>>>>>>> 1. In general, we should add a JIRA ID as prefix of a test when a >>>>>>>> PR targets to fix a specific issue. >>>>>>>> In practice, it usually happens when a JIRA type is a bug or a >>>>>>>> PR adds a couple of tests. >>>>>>>> 2. Uses "SPARK-XXXX: test name" format >>>>>>>> >>>>>>>> If we have no objection with ^, let me go with this. >>>>>>>> >>>>>>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen <[email protected]>님이 작성: >>>>>>>> >>>>>>>>> Let's suggest "SPARK-12345:" but not go back and change a bunch of >>>>>>>>> test cases. >>>>>>>>> I'd add this only when a test specifically targets a certain issue. >>>>>>>>> It's a nice-to-have, not super essential, just because in the rare >>>>>>>>> case you need to understand why a test asserts something, you can >>>>>>>>> go >>>>>>>>> back and find what added it in the git history without much >>>>>>>>> trouble. >>>>>>>>> >>>>>>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <[email protected]> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Hi all, >>>>>>>>> > >>>>>>>>> > Maybe it's not a big deal but it brought some confusions time to >>>>>>>>> time into Spark dev and community. I think it's time to discuss about >>>>>>>>> when/which format to add a JIRA ID as a prefix for the test case name >>>>>>>>> in >>>>>>>>> Scala test cases. >>>>>>>>> > >>>>>>>>> > Currently we have many test case names with prefixes as below: >>>>>>>>> > >>>>>>>>> > test("SPARK-XXXXX blah blah") >>>>>>>>> > test("SPARK-XXXXX: blah blah") >>>>>>>>> > test("SPARK-XXXXX - blah blah") >>>>>>>>> > test("[SPARK-XXXXX] blah blah") >>>>>>>>> > … >>>>>>>>> > >>>>>>>>> > It is a good practice to have the JIRA ID in general because, >>>>>>>>> for instance, >>>>>>>>> > it makes us put less efforts to track commit histories (or even >>>>>>>>> when the files >>>>>>>>> > are totally moved), or to track related information of tests >>>>>>>>> failed. >>>>>>>>> > Considering Spark's getting big, I think it's good to document. >>>>>>>>> > >>>>>>>>> > I would like to suggest this and document it in our guideline: >>>>>>>>> > >>>>>>>>> > 1. Add a prefix into a test name when a PR adds a couple of >>>>>>>>> tests. >>>>>>>>> > 2. Uses "SPARK-XXXX: test name" format which is used in our code >>>>>>>>> base most >>>>>>>>> > often[1]. >>>>>>>>> > >>>>>>>>> > We should make it simple and clear but closer to the actual >>>>>>>>> practice. So, I would like to listen to what other people think. I >>>>>>>>> would >>>>>>>>> appreciate if you guys give some feedback about when to add the JIRA >>>>>>>>> prefix. One alternative is that, we only add the prefix when the >>>>>>>>> JIRA's >>>>>>>>> type is bug. >>>>>>>>> > >>>>>>>>> > [1] >>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l >>>>>>>>> > 923 >>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l >>>>>>>>> > 477 >>>>>>>>> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l >>>>>>>>> > 16 >>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l >>>>>>>>> > 13 >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >>>>>>>>
