Might be worth making a list of project ideas, big and small. "I
> wanted to contribute, but I didn't know where to start" is a common
> enough reason given for not contributing to open source.


Thanks for the suggestion. We have such a list. Perhaps you can suggest
ways to make that list better?
https://sourceforge.net/tracker/?group_id=399595&atid=1657035

Best,
Pablo


On Mon, Mar 5, 2012 at 5:30 PM, Jimmy O'Regan <[email protected]> wrote:

> On 5 March 2012 15:59, Pablo Mendes <[email protected]> wrote:
> >
> >
> >> I've been using
> >> awk -F'\t' '($1>=3){print $0}' < lexic.tsv
> >>
> >> where lexic.tsv is the input to
> >> org.dbpedia.spotlight.util.CreateLexicalizations - I guess now is a
> >> good time to find out if I'm doing it wrong :)
> >
> >
> > Right. If lexic.tsv contains <count,uri,surfaceForm>, and these counts
> came
> > from the Wikipedia paragraphs (occs.uriSorted.tsv) than I'd say you're
> doing
> > it right. Do make sure you merge the (uri->sf) entries coming from
> > occurrences with the ones coming from titles, redirects and
> disambiguations
> > (TRDs), though.
>
> Aha. I had been missing that step.
>
> Also, while we're on this topic, I notice that things like
> '[[las]]ach' are being extracted with the surface form 'las', and not
> 'lasach', as I'd expected. I guess it's not necessary for the DBpedia
> extraction framework, and ISTR that the relevant piece of Mediawiki
> was particularly horrible, but it's something that may be worth adding
> to a FAQ.
>
> > You can choose if you want to do it before or after
> > counting. Merging before counting means that you do not give any special
> > weight to TRDs. Merging after counting means that you consider TRDs to
> be a
> > special class of mappings that deserve to be included even if they are
> not
> > frequently occurring (e.g. helps with sparsity but may include spurious
> > mappings).
> >
> > See (latest revision):
> >
> https://spotlight.svn.sourceforge.net/svnroot/dbp-spotlight/trunk/bin/index.sh
> >
> > I do a basic concatenation there. This means that occurrences in
> Wikipedia
> > pointing at redirects and disambiguations would be missed. Best would be
> to
> > extend ExtractCandidateMap to already read in the occs, and do the same
> job
> > we currently do with cut/sort/grep/sed, plus the transitive closure of
> URIs.
> > We would love if anybody volunteered to send us that patch.
> > (
> https://sourceforge.net/tracker/?func=detail&aid=3497056&group_id=399595&atid=1657035
> > ) Otherwise, whenever I have some time I'll work on it and include it in
> the
> > next release.
>
> Might be worth making a list of project ideas, big and small. "I
> wanted to contribute, but I didn't know where to start" is a common
> enough reason given for not contributing to open source.
>
> --
> <Sefam> Are any of the mentors around?
> <jimregan> yes, they're the ones trolling you
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to