El dt 22 de 03 de 2011 a les 17:55 +0000, en/na Jimmy O'Regan va escriure: > On 22 March 2011 17:04, Mohit Taneja <[email protected]> wrote: > > Hi, > > > > I have been digging about the use of flag diactrics in morphological > > analysis > > (http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox). > > I understood the need for it , which would be mostly in cases in which the > > languages have prefix inflection as well as circumfix inflection, in > > addition to the suffix inflection. > > > > So, when one is checking for different analysis/generation from root word, > > there could be certain pairs of suffix and prefix inflections which are just > > not possible with each other, so to avoid them we use flag diactrics. > > > > But, I am not able to understand that how this thing is done currently at > > compile time and how can we port this functionality to runtime. I have been > > trying to read stuff from the FSM book. Also, I checked out the code from > > svn and compiled lttoolbox. And with that i tried to lt-expand the > > dictionary given here > > http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox > > . When doing so, I get an error : Error (19): Invalid node '<cdefs>'. > > That's just a speculation as to how it would look. Your mission, > should you choose to accept it, would be to /also/ implement the > change to the dictionary format. > > Presumably, going by that page, the implementation would keep a second > Alphabet of symbols (the cdefs)[1], and each transduction would be > checked for those symbols, and, if present, that there is 1) more than > one and 2) that they match, otherwise the transduction is discarded. > The code to discard a transition is already implemented for compounds > -- only the change to the compiler to add the two new XML elements, > and the new runtime check for those symbols. > > [1] This is probably a little impractical - it would probably be > better to just add them the same way as a regular sdef, and keep a > list of the integers corresponding to cdefs: > > void > Compiler::procCDef() > { > // If it's already defined, it may have been as an sdef > if(alphabet.isSymbolDefined(symbol)) > { > wcerr << L"Error (" << xmlTextReaderGetParserLineNumber(reader); > wcerr << L"): Symbol already defined: '" << symbol << L"'." << endl; > exit(EXIT_FAILURE); > } > > alphabet.includeSymbol(L"<"+attrib(COMPILER_N_ATTR)+L">"); > cdefs.push_back(alphabet(L"<"+attrib(COMPILER_N_ATTR)+L">")); > } > > (you would need to add > list<int> cdefs; > to compiler.h in lttoolbox, and add code to write the list to the > output, but that shouldn't take more than 5 minutes; doing it this > way, the code to handle <c/> would be exactly the same as for <s/>) > > You should probably think of some other project to add to your > proposal, because I really don't think this would take 3 months (or > even 3 weeks) to implement. Jacob already did the hard part of it for > compounds.
The hard part will be testing, and making sure it behaves as expected. We would probably also ask you to implement a lexicon using this new feature -- e.g. Kurdish -- or perhaps convert an existing one like Tajik. Depends on the languages you know of course. Fran ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
