Re: LLM script for error message improvement

Maciej Fri, 04 Aug 2023 01:41:22 -0700

Besides, in case a separate discussion doesn't happen, our core responsibility is to follow the ASF guidelines, including the ASF Generative Tooling Guidance (https://www.apache.org/legal/generative-tooling.html).

As far as I understand it, both the first (which explicitly mentions ChatGPT) and the third acceptance conditions are not satisfied by this and the other mentioned PR.


On a side note, we should probably take a closer look at the following

'When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”.'


and consider adjusting PR template / merge tool accordingly.

Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC

On 8/3/23 22:14, Maciej wrote:

I am sitting on the fence about that. In the linked PR Xiao wrote the following >We published the error guideline a few years ago, but not all contributors adhered to it, resulting in variable quality in error messages. If a policy exists but is not enforced (if that's indeed the case, I didn't go through the source to confirm that) it might be useful to learn the reasons why it happens. Normally, I'd expect -Policy is too complex to enforce. In such case, additional tooling can be useful. -Policy is not well known, and the people responsible for introducing it are not committed to enforcing it. -Policy or some of its components don't really reflect community values and expectations. If the problem of suspected violations was never raised on our standard communication channel, and as far as I can tell, it has not, then introducing a new tool to enforce the policy seems a bit premature. If these were the only considerations, I'd say that improving the overall consistency of the project outweighs possible risks, even if the case for such might be poorly supported. However, there is an elephant in the room. It is another attempt, after SPARK-44546, to embed generative tools directly within the Spark dev workflow. By principle, I am not against such tools. In fact, it is pretty clear that they are already used by Spark committers, and even if we wanted to, there is little we can do to prevent that. In such cases, decisions which tools, if any, to use, to what extent and how to treat their output are the sole responsibility of contributors. In contrast, these proposals try to push a proprietary tool burdened with serious privacy and ethical issues and likely to introduce unclear liabilities as a standard or even required developer tool. I can't speak for others, but personally, I'm quite uneasy about it. If we go this way, I strongly believe that it should be preceded by a serious discussion, if not the development of a formal policy, about what categories of tools, to what capacity, to what extent are acceptable within the project. Ideally, with an official opinion from the ASF as the copyright owner.
WDYT All? Shall we start a separate discussion?
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC
On 8/3/23 18:33, Haejoon Lee wrote:
Additional information:
Please check https://issues.apache.org/jira/browse/SPARK-37935if you want to start contributing to improving error messages.
You can create sub-tasks if you believe there are error messages that need improvement, in addition to the tasks listed in the umbrella JIRA.
You can also refer to https://github.com/apache/spark/pull/41504, https://github.com/apache/spark/pull/41455as an example PR.
On Thu, Aug 3, 2023 at 1:10 PM Ruifeng Zheng <ruife...@apache.org> wrote:

    +1 from my side, I'm fine to have it as a helper script

    On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon
    <gurwls...@apache.org> wrote:

        I think adding that dev tool script to improve the error
        message is fine.

        On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
        <haejoon....@databricks.com.invalid> wrote:

            Dear contributors, I hope you are doing well!

            I see there are contributors who are interested in
            working on error message improvements and persistent
            contribution, so I want to share an llm-based error
            message improvement script for helping your contribution.

            You can find a detail for the script at
            https://github.com/apache/spark/pull/41711. I believe
            this can help your error message improvement work, so I
            encourage you to take a look at the pull request and
            leverage the script.

            Please let me know if you have any questions or concerns.

            Thanks all for your time and contributions!

            Best regards,

            Haejoon

OpenPGP_signature
Description: OpenPGP digital signature

Re: LLM script for error message improvement

Reply via email to