Re: [CALL FOR TEST DATA] Request help identifying public domain or opensource test data sets for Metron testing

Otto Fowler Thu, 04 May 2017 16:22:51 -0700

I will just add that METRON-777 that is out for review has significantly
changed how parsers are created and packaged, though not the parser code
and interfaces. If you have parsers and it lands first you will have to
refactor, but it is not difficult, and I will help you.


On May 4, 2017 at 18:27:56, Matt Foley ([email protected]) wrote:

> Hi Dima,
> In terms of process, I’m not aware of any changes to the below. In
> response to your specific questions:
>
> 1 and 2. The individual CLA is always required, as it establishes your
> authority to contribute your contributions. It is up to you to determine
> whether the corporate CLA is needed, but if your employer has ownership of
> the code you write, then you’ll want to get the CCLA signed for your own
> protection as well as Apache’s. The following Apache document section says
> everything better than I could:
> https://www.apache.org/dev/new-committers-guide.html#cla
>
> BTW, this is in the context of a new committer signing the ICLA, but a
> non-committer contributor contemplating a significant submission can and
> should also sign the ICLA. You just won’t get an apache email id yet :-)
>
> 3. After you sign up on Github and Apache Jira, drop an email to Casey
> asking him to give you karma on Jira so you can self-assign tickets and
> change their status.
>
> 5. Within reason. You don’t need separate jiras for the little pieces. But
> probably two per sensor (for parser and test code generator) would make
> sense. Basically you should try to split the work into independent
> reviewable chunks, and have a jira for each chunk. Several small PR’s are a
> lot easier to review and test than one huge one.
>
> 6. The email isn’t really a requirement, it’s more to smooth the way.
> Think of it as an opportunity to send a cover letter to precede your PR.
>
> Expectations? The most common response is encouragement. Which is good,
> because you’re trying to recruit interest. You’ll need people to put in
> significant time to review your contributions when you make the PR. If you
> ask questions, advice and suggestions are usually readily offered.
>
> If you were suggesting architecture changes, you’d definitely want to
> discuss them in the email list *before* doing the work to implement them,
> because the PMC members have to approve any architecture changes, and you
> wouldn’t want to get a rejection in the PR after doing all that work. There
> can be some controversy about architecture issues, but such discussions
> should be driven solely by technical merit, and stay professional and
> friendly in tone. Issues of maintainability, usability, testability, and
> performance, are fair game, as well as features. Consistency with existing
> architecture is encouraged.
>
> But you’re talking about parsers, which are pretty well a plug-in model
> with a standard interface you shouldn’t need to change. So the email is
> just a “hey remember those parsers we talked about? They’re coming shortly”
> message. If you have architecture concerns, or want to clarify anything
> before doing the submission, by all means bring them up too.
>
> 8. How to create a pull request:
>
> Make a fork of Metron in github, if you haven’t already, and create a
> branch named METRON-XXXX (the jira number your PR will address). Make sure
> the branch is updated to current Apache master, then merge in your work
> (for that Jira only), commit, and push to your github fork. Now browse to
> your fork in Github, and select the METRON-XXXX branch, then select the
> “Pull requests” tab at the top of the page. On this page there’s a big
> button labeled “New pull request”. Click it, and adjust:
> * base fork: apache/metron
> * base: master
> * head fork: <your-github-name>/metron
> * compare: METRON-XXXX
> From here it should be self-explanatory. It will construct the PR and ask
> you to fill in a template. You can see the diffs that reviewers will see.
> When you finalize the PR, it will automatically be published to the dev@
> mailing list.
>
> Hope this helps,
> --Matt
>
>
>
> On 5/4/17, 2:43 AM, "Dima Kovalyov" <[email protected]> wrote:
>
> Hello Matt,
>
> It's been long-time for us to continue working in this direction further.
> Thank you for the response.
>
> I wanted to ask if anything changed since our last discussion regarding
> parsers, enrichments and generators contribution. Is there anything else we
> should be doing other then:
> 1. Sign Corporate CLA with Apache (link).<
> https://www.apache.org/licenses/#clas>
> 2. Sign an Individual CLA for the submitter (instructions<
> https://www.apache.org/licenses/#clas>), I need to do that despite #1?
> 3. Register on Apache GitHub and JIRA.
> 4. Open JIRA master ticket for submissions from SSTECH.
> 5. Create sub-task for each piece of code we are going to submit.
> 6. Send email to [email protected]<mailto:[email protected]>
> describing proposed changes including JIRA case. What to expect from email?
> Approval or suggestions?
> 7. Fork Apache Metron master branch internally, merge our changes and test
> them using single-node vagrant.
> 8. Create Pull Request (PR), how?
> 9. Wait for the dev team to review, accept changes and answer any
> questions or suggestions.
>
> This above applies to the code that was:
> 1. Written and tested.
> 2. Covered with unit tests.
> 3. Can be built using maven
> 4. Has place in the Apache Metron folder tree.
>
> - Dima
>
>
> On 10/08/2016 06:43 AM, Matt Foley wrote:
> Hi Dima,
> Sorry this is getting a little long, but TL;DR on
> Metron+Development+Environment+Setup+Instructions<
> https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions>
> is:
>
> A. Open a Jira for the work you want to do, or the contribution you want
> to make. Since you have several parsers, you might open an umbrella Jira,
> with four subtask jiras, each of which includes the parser and test data
> generator for one of the four technologies you mentioned.
> B. Send an email to the dev list proposing what you want to submit, and
> referencing the Jira.
> C. Fork the Apache Metron code base in your personal github area.
> D. Make sure your contribution works correctly with the latest master
> branch code.
> E. Decide where in the code tree your contribution would fit best. The
> parsers themselves would of course go under
> metron-platform/metron-parsers/. The data generators could reasonably be
> put in the test/ subdirectory, perhaps under
> metron-platform/metron-parsers/src/test/java/org/apache/metron/writers
> (although we would defer to the reviewers).
> F. Add the necessary maven glue so the new pieces build along with the
> core.
> G. Metron requires all submissions to have unit tests with thorough
> coverage, so add those if they aren’t there yet.
> H. When things are ready to submit, commit everything to your github, and
> create a Pull Request (PR)
> I. Watch the PR and Jira for responses. Respond to questions, accept
> feedback or suggest alternative solutions, and work through the process
> with the community. If things need lengthy discussion, you may be asked to
> do so in the dev list.
> J. With patience, all issues will be agreed on, and the contribution will
> be accepted into Metron, for the benefit of the whole community.
>
> Hope this helps. Feel free to contact me directly, or just ask questions
> on the dev list.
> Best regards,
> —Matt
>
>
> On Oct 7, 2016, at 6:05 PM, Matt Foley <[email protected]<mailto:
> [email protected]>> wrote:
>
> Dima, that’s great!
>
> Since you’re talking about a code contribution (or several :-), let’s move
> the discussion over to the [email protected]<mailto:
> [email protected]> list, after this response. Briefly,
> here’s how you submit a contribution.
>
> First the housekeeping:
> 1. If Sstech has not yet signed a Corporate CLA with Apache, please ask
> them to do so (instructions<https://www.apache.org/licenses/#clas>)
> 2. If you, or a colleague who will submit the contributions, has not yet
> signed an Individual CLA, please do so (instructions<
> https://www.apache.org/licenses/#clas>)
>
> Since you’ve been successfully writing Metron parsers, you almost
> certainly have already done the following, but I’ll mention them here for
> the sake of other readers:
> 3. If you’re not on the dev mailing list, please join it (instructions<
> https://cwiki.apache.org/confluence/display/METRON/Community+Resources>)
> 4. If you weren’t a registered user of Apache’s Jira, you would request to
> be added, but I see you already are, so that’s good.
> 5. If you don’t yet have an account on Github.com<http://github.com/>,
> sign up for one (the free level is fine).
> 6. Set up a Metron Development Environment, and establish the ability to
> spin up a single-node test environment (instructions<
> https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions>)
>
>
> To actually make the contribution, you follow the process shown in:
>
> https://cwiki.apache.org/confluence/display/METRON/Metron+Development+Environment+Setup+Instructions
>
> I’ll go into more detail in a direct email.
> Thanks a lot for being interested in submitting these!
>
> Cheers,
> —Matt
>
> ________________________________
> From: Dima Kovalyov <[email protected]<mailto:
> [email protected]>>
> Sent: Friday, October 07, 2016 4:44 PM
> To: [email protected]<mailto:
> [email protected]>; Satish Abburi
> Subject: Re: [CALL FOR TEST DATA] Request help identifying public domain
> or opensource test data sets for Metron testing
>
> Hello Matt,
>
> We (Sstech team) currently have parsers and data generators for BlueCoat,
> Unix, MS Exchange, MS Windows and we would gladly contribute them.
>
> Can you please share the procedure for submitting these peaces?
> Thank you.
>
> - Dima
>
> On 10/08/2016 01:49 AM, Matt Foley wrote:
> Hi all,
> Enhanced testing of Metron, especially performance testing, would be aided
> by having data sets of realistic size, that exercise one or more of the
> various parts of Metron:
>
> * each Parser (bro, yaf, snort, squid, ...)
> * each Enhancer (geo, user, assets, ...)
> * each Threat Intel module (Soltra, HailATaxi, ...)
>
> Data sets must meet the following criteria:
>
> * opensource or public domain
> * suitably scrubbed, containing no Personally Identifiable Information
> * unencumbered by company sensitivity, security, or IP concerns.
>
> They may take the form of raw PCAP streams, or they may be already parsed
> or otherwise pre-processed.
>
> If you know of opensource or public domain data sets of this kind, please
> respond with the URL, in this email thread or to the Jira ticket METRON-491<
> https://issues.apache.org/jira/browse/METRON-491>.
>
> If you have an appropriate data set that your company would be willing to
> contribute, please also respond and we will help in any way we can.
>
> 
> Thanks,
> --Matt
>
>
>
>
>
>
>

Re: [CALL FOR TEST DATA] Request help identifying public domain or opensource test data sets for Metron testing

Reply via email to