Re: [Apertium-stuff] [Gsoc 2011] Flag Diactrics in lttoolbox

Francis Tyers Tue, 22 Mar 2011 11:56:11 -0700

El dt 22 de 03 de 2011 a les 17:55 +0000, en/na Jimmy O'Regan va
escriure:
> On 22 March 2011 17:04, Mohit Taneja <[email protected]> wrote:
> > Hi,
> >
> > I have been digging about the use of flag diactrics in morphological
> > analysis
> > (http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox).
> > I understood the need for it , which would be mostly in cases in which the
> > languages have prefix inflection as well as circumfix inflection, in
> > addition to the suffix inflection.
> >
> > So, when one is checking for different analysis/generation from root word,
> > there could be certain pairs of suffix and prefix inflections which are just
> > not possible with each other, so to avoid them we use flag diactrics.
> >
> > But, I am not able to understand that how this thing is done currently at
> > compile time and how can we port this functionality to runtime. I have been
> > trying to read stuff from the FSM book. Also, I checked out the code from
> > svn and compiled lttoolbox. And with that i tried to lt-expand the
> > dictionary given here
> > http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox
> > . When doing so, I get an error : Error (19): Invalid node '<cdefs>'.
> 
> That's just a speculation as to how it would look. Your mission,
> should you choose to accept it, would be to /also/ implement the
> change to the dictionary format.
> 
> Presumably, going by that page, the implementation would keep a second
> Alphabet of symbols (the cdefs)[1], and each transduction would be
> checked for those symbols, and, if present, that there is 1) more than
> one and 2) that they match, otherwise the transduction is discarded.
> The code to discard a transition is already implemented for compounds
> -- only the change to the compiler to add the two new XML elements,
> and the new runtime check for those symbols.
> 
> [1] This is probably a little impractical - it would probably be
> better to just add them the same way as a regular sdef, and keep a
> list of the integers corresponding to cdefs:
> 
> void
> Compiler::procCDef()
> {
>   // If it's already defined, it may have been as an sdef
>   if(alphabet.isSymbolDefined(symbol))
>   {
>     wcerr << L"Error (" << xmlTextReaderGetParserLineNumber(reader);
>     wcerr << L"): Symbol already defined: '" << symbol << L"'." << endl;
>     exit(EXIT_FAILURE);
>   }
> 
>   alphabet.includeSymbol(L"<"+attrib(COMPILER_N_ATTR)+L">");
>   cdefs.push_back(alphabet(L"<"+attrib(COMPILER_N_ATTR)+L">"));
> }
> 
> (you would need to add
> list<int> cdefs;
> to compiler.h in lttoolbox, and add code to write the list to the
> output, but that shouldn't take more than 5 minutes; doing it this
> way, the code to handle <c/> would be exactly the same as for <s/>)
> 
> You should probably think of some other project to add to your
> proposal, because I really don't think this would take 3 months (or
> even 3 weeks) to implement. Jacob already did the hard part of it for
> compounds.


The hard part will be testing, and making sure it behaves as expected. 

We would probably also ask you to implement a lexicon using this new
feature -- e.g. Kurdish -- or perhaps convert an existing one like
Tajik. Depends on the languages you know of course.

Fran


------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] [Gsoc 2011] Flag Diactrics in lttoolbox

Reply via email to