Hi Alex,
Some great ideas, all of which are deserving of comment.

>   - There is code commented out, but much of this code seems to still be
   valuable, like it was commented from some migrations and was left over for
   somebody to follow-up (e.g. unit tests).
True.  Some intelligence is required.  When in doubt, leave it - but there are 
a lot of things that are obviously moved or old rewritten code.  This is all 
volunteer and just getting people involved with "baby steps" would be great.  I 
would also hope that some inactive authors come back and clean up comments in 
their own code.  Or write those unit tests if that was the intention.  There 
are  TODO comments in the code that could be tackled.

>   - There are issues reported by SonnarQube [1] like:
This should be handled with kid gloves.  A lot of those reports cover items 
that are not yet complete, ordered for easier following / understanding of 
code, etc.  However a lot can be handled easily and quickly, like adding 
@Override ...  People can use local plugins that check code like findBugs.  I 
used to be religious about it but have become lax.  This is a good reminder for 
me to start again.

>   - Removal of hardcoded paths like: "/tmp",
I am in complete agreement.  Things like /tmp should probably even be 
refactored to use temp files.  Things like default paths used in static 
createAnnotatorDescription() should instead probably be used in 
@ConfigurationParameter default= ...
--- Building upon that statement, it would be nice to migrate older annotation 
engines, readers, and cas consumers to the uimafit paradigm.  This would help a 
newbie understand the difference and how to use AEs, etc.

>   - Migrate scripts from Ant (files like build-*.xml) to maven.
Does ctakes have these?  I guess that I've missed them.  Yeah, full maven would 
be nice.

>   - Deprecated code
We certainly have a lot of it.  It is a good excuse to make unit tests before 
updating.

>   - I think it is time to define some conventions for:
      - formatting (identation),
      - crlf conventions (see .gitattributes)
      - etc
You are correct; indentation and crlf should also be settable by a decent ide 
for any cvs.  I think that most ctakes code is space indented, 3 per 
indentation, and \n only for newlines.  I could be wrong.
Things that are more stylistic (naming, ordering, etc.) are much more coder- 
preference.  I would rather have contributions than turn people off with 
strictures.  I'll even take things like missing { } ...  though there is 
another great target for refactoring ...

>   - For git vs Subversion, I am able to use the same folder with a .git
Thanks for the documentation!  As an Apache project we would need to vote on 
fully moving to git (as Tim and Dave suggested).  I am definitely not opposed 
to that - I use github for everything else these days ...

>   - There are commits without any reference to Jira issues or other type
Guilty as charged.  A lot of my commits are new development and I only write 
commit comments.  I could open a jira for each, but I am admittedly lazy about 
such things.  Ditto for placing links in an email appendix.

> Also, based on the decision to use semantic versioning, it
   will need to choose between 4.0.1 or 4.1.0.
Personally I think that our next release should be 4.1.0 as there are enough 
new features to distinguish that it isn't just a patch release.
http://semver.org/

Thanks,
Sean

-----Original Message-----
From: Alexandru Zbarcea [mailto:al...@apache.org] 
Sent: Monday, November 20, 2017 8:34 AM
To: Apache cTAKES Dev; Hadrian Zbarcea
Subject: Re: Contribute to ctakes: it is in your best interests! RE: unknown 
dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

Hi,

To grow the community and bring even more adoption is my desire, too. I cannot 
agree more with what you said, Sean, Tim.

I have discussed with Hadrian (Apache member) about cTAKES adoption and I think 
he has great ideas about the priorities for this community to grow. I will like 
to introduce him to the community and let him express some ideas.

In regards to the technical issues that where already identified on this 
thread, I would like to understand your perspective and prioritization.

   - There is code commented out, but much of this code seems to still be
   valuable, like it was commented from some migrations and was left over for
   somebody to follow-up (e.g. unit tests).
   - There are issues reported by SonnarQube [1] like:
      - 3.3K bugs [2]
      - 16.5% code duplication (24K LoC) [3]
      174 bugs in the last month [4]


   - I would like to see more Unit Tests for the code. There are new
   commits unrelated to a feature description and so, there is no clear
   understanding about what the review should focus on. I think it relates to
   the same request from Sean to have "sanity-test type unit tests - Little
   two or three-line "does this method crack" tests.". I see this task as one
   of the most important one.
   - Removal of hardcoded paths like: "/tmp",
   "C:/Users/<some-user>/<some-path>.
   - Migrate scripts from Ant (files like build-*.xml) to maven. It makes
   the code so unpredictable. I find it difficult to navigate through these
   when tests are dependent upon these executions.
   - Classpaths manually specified.
   - Deprecated code
   - Old libraries which involve security risks in production (e.g. Spring
   that was just upgraded)

Other tasks that are related more to productivity.

   - I think it is time to define some conventions for:
      - formatting (identation),
      - crlf conventions (see .gitattributes)
      - etc
   - For git vs Subversion, I am able to use the same folder with a .git
   and .svn VCS and documented on the wiki [5].
   - There are commits without any reference to Jira issues or other type
   of documentation. In consequence, when release will come, it will be very
   hard to hunt those changes and understand why those commits were made: bugs
   vs features. Also, based on the decision to use semantic versioning, it
   will need to choose between 4.0.1 or 4.1.0.

My $0.02,
Alex

[1] -
https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_analysis_overview-3Fid-3Dorg.apache.ctakes-253Actakes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=ZBpW0OVPlYu308dmEv3E6DK93VfUe8NLi0OClLqa2Sk&e=
[2] -
https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_analysis_component-5Fissues-3Fid-3Dorg.apache.ctakes-253Actakes-23resolved-3Dfalse-257Ctypes-3DBUG&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=Vot25EW4XwGjz9uLwHo4rc62shM_0n-6Yy5u9BjktsM&e=
[3] -
https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_analysis_component-5Fmeasures_metric_duplicated-5Fblocks_list-3Fid-3Dorg.apache.ctakes-253Actakes&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=NKhS3KX3JBBiuFbfjPSq2WT-qibS-QSQzqkG8KbiLIk&e=
[4] -
https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_analysis_component-5Fissues-3Fid-3Dorg.apache.ctakes-253Actakes-23resolved-3Dfalse-257Ctypes-3DBUG-257CsinceLeakPeriod-3Dtrue&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=tNsgiXoIKXQPQAzM7g-EEXEephKMNEG50OBl8iuD6lU&e=
[5] -
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BDeveloper-2BInstall-2BGuide-23cTAKES4.0DeveloperInstallGuide-2DSubversion-2BGit&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=lMZ18SEZob73AXp4a3sMrd22nHpwFtQ__4fR-Q5QQuI&e=



On Mon, Nov 20, 2017 at 6:32 AM, Miller, Timothy < 
timothy.mil...@childrens.harvard.edu> wrote:

> Git is available to apache projects, and many projects have moved over 
> (see here: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git-2Dwip-2Dus.apache.org_repos_asf&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP-d8XkCTXvOuP-YWuI&s=qGV9tIcYJGK-tQAMYm5cWevWrBSixPCHj3VfaXum288&e=):
> Here is the general info on what that looks like:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_de
> v_writable-2Dgit&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeF
> U&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySL
> w2RPNP-d8XkCTXvOuP-YWuI&s=BRSYUV67HZtyxzLNbqPzAlS-YZmqUpA30rvPsNKX6i0&
> e=
>
> A few points from that link:
> > Projects can request moving to Git as their main code repository, by
> creating an INFRA issue. See also the infra-contact page. > Projects 
> can request new, blank repositories by using reporeq.apache.org.
> > The current system has basic git support only. We are working on
> extending this service in the near future.
> > Custom commit or other hooks will not be supported, all projects get 
> > the
> same hooks. Setting up gitpubsub should provide sufficient flexiblity 
> without impacting the core Git setup, volunteers are welcome to make 
> that happen.
>
> (Not sure what basic support only means.)
>
> There are also read-only git repos available by default for every 
> project and updated in near-real-time:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_de
> v_git.html&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs
> 67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=PHstasp4Y8wYPWGquySLw2RPNP
> -d8XkCTXvOuP-YWuI&s=CtgGvLG2s_KqVRWx_tZAcaMSh_KKH4aqc6HGTP3dmtA&e=
>
> with those I guess the suggested workflow is to work off of that repo 
> and then just submit patches to someone who commits with svn rather 
> than committing directly.
>
> I've been using the git-svn connector myself recently since I just 
> vastly prefer the git lightweight branching for focused development, 
> as it helps me keep a cleaner working directory. But that adds some 
> additional annoying steps.
>
> Tim
>
> ________________________________________
> From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> Sent: Saturday, November 18, 2017 1:23 PM
> To: dev@ctakes.apache.org
> Subject: RE: Contribute to ctakes: it is in your best interests! RE:
> unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
>
> Hi Dave,
>
> Those are some great thoughts.  Being an apache project I am not sure 
> how far we can move from svn, but there may be a way.  You are not the 
> first to voice this desire for an active github repo and I'm sure that 
> you won't be the last.
>
> I completely agree with your discussion board preference.  Do you have 
> any recommendations?
>
> You make a great point regarding documentation.  In reference to 
> things that anybody can quickly contribute ... that would be a big one.
> Volunteers?!?
>
> I am really happy to hear that you want to contribute - more than you 
> already have, which is actually quite a bit!
>
> Cheers,
> Sean
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.d...@gmail.com]
> Sent: Saturday, November 18, 2017 1:10 PM
> To: dev@ctakes.apache.org
> Subject: Re: Contribute to ctakes: it is in your best interests! RE:
> unknown dependencies [EXTERNAL] [SUSPICIOUS]
>
> Sean, I can share a couple things that have been an obstacle for me. 
> It may seem a minor point to some, but I left Subversion behind years 
> ago and really have no desire to go back. If the project were moved 
> over to Git/Github it would really smooth the way for me at least. I 
> would be happy to help out with this. One of the other things I would 
> really like to see is the mailing list moved onto a discussion board 
> platform. It seems to me that a discussion board style of tool tends 
> to create a more active community than a mailing list does.
>
> The other thing that might help get new people involved is making it 
> easier to find information about the development environment. Things 
> like branching strategies, coding conventions, etc are really hard to 
> find from the main cTAKES web site. I saw some references to Jenkins 
> builds recently on the list. I had no idea there was a Jenkins CI 
> server for the project somewhere. It also takes some digging to find a 
> link to Jira. Maybe we could create a Wiki page that describes where 
> all these tools are and how they are used.
>
> You guys have really done some great work over the last couple of 
> years cleaning up the code base and improving the documentation by a 
> ton. Things like the fast dictionary annotator, dictionary creator GUI 
> are a great addition and make it a lot easier for other people to get 
> up and running more quickly. As I'm ramping up my research as well as 
> some proof of concept stuff at work I'll be working more and more with 
> cTAKES and would love to contribute more to the project.
>
> Just my thoughts.
>
> - Dave
>
>
> On Sat, Nov 18, 2017 at 11:10 AM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Tim, Alex,
> >
> > Great ideas.  I like your (Tim) idea to 1. start with commented code 
> > removal.
> > Then maybe move on to
> > 2. sanity-test type unit tests - Little two or three-line "does this 
> > method crack" tests.
> > And another that is simply
> > 3. "populate a test cas with type(s) X" and a factory with 
> > "getSectionTestCas" "getSetenceTestCas" "getPosTestCas" "getChunkTestCas"
> > ...  just really simple reusables for tests.
> > Then
> > 4. refactor to extract and consolidate duplicate code - it is all 
> > over the place ...
> >
> > These are just my initial thoughts and suggestions, but I think that
> those
> > 4 tasks can be performed by anybody of any experience level.   They build
> > upon each other and should help the implementers better understand
> ctakes.
> > After that the sky is the limit.
> >
> > A couple of years ago I sat on a panel at a workshop for open source 
> > scientific software.  For the half dozen or so highlighted projects 
> > (ctakes was one!) the common thread was that getting people to 
> > contribute is extremely difficult.
> > I have a tendency to assume that people always act in their best 
> > interests.  Any student thinking of going towards industry should be 
> > jumping at the opportunity to contribution to a large, 
> > production-quality project.  They should also realize that 
> > contribution means potential recommendation (and possibly hiring
> > interest) by established developers, physicians and researchers that 
> > use ctakes.  Even just answering questions on a user or dev list 
> > creates
> credibility and can build a network.
> > Active researchers could discover common thoughts and directions 
> > that could lead to collaboration outside ctakes.  Researchers and 
> > companies trying to build upon open source should realize that 
> > direct contribution is easier than custom substitution.  Plus, it is 
> > in their best interests that code does what they need it to do in 
> > the fastest, lightest, most stable way possible.
> > With a project like ctakes there are a lot of things that can be 
> > done, there are great opportunities to really shine.  "I wrote this 
> > tool for my thesis that performs some nlp task" sounds good.  
> > Appending "in an Apache product and it has been taken up by thousands 
> > across the globe"
> > makes it sound a lot better.
> > At my previous job in industry the company actively contributed to 
> > several open source projects.  We had a few people for whom that was 
> > 50% of their job.  Why?  Because we made a commitment to use that 
> > open
> source software.
> > It was a better use of our resources to contribute to it, improve it 
> > and keep its momentum going and prevent it from becoming stale (or
> > abandoned) while our software continued to move forward.
> >
> > Hmm, that was a touch more than I had planned to write.  A whole cup 
> > of coffee in that one.
> >
> > Sean
> >
> >
> >
> >
> > -----Original Message-----
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Saturday, November 18, 2017 8:13 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]
> >
> > Thanks Alex, looks like that was probably a fat-fingered auto-import 
> > on my part.
> >
> > I like your idea, and I don't know the best way to to start either, 
> > but maybe one suggestion is to start with one or two focused things 
> > to clean up, and then ask for volunteers to take on specific modules?
> > Then people can contribute an hour here and there to do cleanup on 
> > their task/module and try to fix that thing in a 1-2-month long 
> > sprint. I am happy to contribute to cleanup, I am responsible for my 
> > fair share of unclean code, but since I don't have strong software 
> > engineering chops it would be good to have people with that 
> > background propose the tasks and describe exactly what needs to be 
> > done. My idea of cleaning is just to delete commented out sections of 
> > evaluation code.
> >
> > Tim
> >
> > ________________________________________
> > From: Alexandru Zbarcea <al...@apache.org>
> > Sent: Friday, November 17, 2017 4:46 PM
> > To: Apache cTAKES Dev
> > Subject: unknown dependencies [EXTERNAL]
> >
> > Hi,
> >
> > I notice that a miss-dependency has slipped in the code:
> > jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;
> >
> > Now, that the Jenkins builds is successful, I think it is easier to 
> > clean-up the code. I would like to be a common effort. I don't know 
> > the best way to approach this.
> >
> > Looking forward to your advice,
> > Alex
> >
>

Reply via email to