Neil Williams <[EMAIL PROTECTED]> writes: > That's the first job, build the DTD and enforce it. > (abort loading of XML files that don't match the DTD structure - that will > need to be done by 'proper' C XML handling libraries, libxml etc.)
So you're suggesting a re-write of all the existing XML i/o to use verified schema instead of the existing hand-created generators and parsers? Not that I'm against this idea, but doesn't that go against the grain of re-using the existing code? I suppose that's fine provided your plan is: 1) reverse-engineer the existing XML objects into Schemas 2) re-write the xml code to use schemas 3) drop rewrite in place of existing code I just suspect this is a lot of work for (IMHO) little gain. The existing parsers do some level of validation, just not cleanly per a Schema. >> problem is that the XML subsystem does not have a "merge". There is >> no intermediate step of "load Datafile" that will merge into an >> existing open Datafile. That's the import step that needs to happen. >> Yes, we have the code that will read the data and load it into a bunch >> of objects in RAM. What we do NOT have is the GUI and logic to merge >> those object into an existing datafile-in-RAM. > > OK. (I anticipated that it would be a new procedure.) I suspect this is the majority of the work.. Lots of gotchas. >> > There must be some level of XML parsing already being performed within >> > Gnucash file operations. File->Open and File->Save etc. >> > This would simply be downgraded to import-export. >> >> Yes, but you're missing the necessary "merge" logic which currently >> does not exist. Yes, the actual I/O functions exist, that's not the >> hard part. > > Not missing it, just concentrating on getting an accurate understanding of the > problem. Ok. The xml i/o is _mostly_ reusable. I think it would be a small amount of work to get it to be reusable. The hard part is definitely the merging. > XML formatting can assist in the labelling of data chunks so that it's easier > to handle collisions. A certain bit of data cannot occur in certain elements > of the XML, as dictated by the DTD - the precludes certain collision events > and limits to number of possible problems. It doesn't deal with all problems, > but it can help with problems that CSV would leave behind - right data in the > wrong column. I dont think the data-labeling has anything to do with collision-detection. See the QIF and OFX importers as examples. Their data is labeled just fine; the majority of work is NOT in reading the data file, but in merging the data into gnucash. Duplicate detection, account determination, etc -- that's the tough part. > Been there, done that. Not looking at the detail until I get the structure > right. Then I'll start on the detail. > :-) But we already have the XML structures defined (albeit not in a Schema). Been there, done that -- reuse what we've got unless you really want to re-write all the i/o. But if you have limited time I see little reason to do that. > At this point, I'm really not keen on CSV as a format for invoices and I think > I'll have my hands full with the XML data merge problem, so can I leave the > CSV import function to someone more able / with more time? Please?! Fair enough. Honestly, if you want to implement a "gnucash XML import" I'm 100% behind you, and I think it's a great idea. I'm not trying to derail that concept. I think it's a GREAT idea. >> Yes, the xml parsers need to be modified to not require a full book. >> Not TOO difficult, I don't think. > > As we both know, the problems come after the partial import. Agreed. >> >> * transaction matching >> > >> > I'll need help with that. The existing procedures are presumably not >> > anticipating a merge with existing data but are set to be read into an >> > otherwise empty memory allocation. >> >> EXACTLY! > > So that's the bulk of the task. OK. IMHO, yes. I think the bulk of the task is the merging. You need to determine what data in the import maps to the existing data, what data is new, and what data is a duplication. I think this is the hardest part. There's been a good deal of work to do this with Transactions, but not with Accounts, or any of the business features. NOTE: a general API to "merge two books" would be a good potential solution. >> > Is it acceptable to have a very simple rule? >> >> Depends on the rule. Regardless, it requires user input. > > So each collision event is raised with the user - correct? Yes. Although it might behoove you to collect all the events and save the applause until all the names have been called. If you've got 100 events, would you rather have 100 pop-ups or a window with a list of 100 items? Personally I'd rather have the latter. > I was thinking about my original idea about adding invoices to the data > file/source. That uses existing data as a reference but adds new data. I > guess the problem would be if the user tried to import the same file twice. > That brings in the problem above, as well as in cases of a more extensive > import of other objects that may well require overwriting existing data. See above. That's definitely one problem. Another problem is trying to add data if you don't (necessarily) have an internal data reference. >> Not what I meant. You may need to perform a transaction match or >> duplicate check. This has nothing to do with XML input and everything >> to do with data coherency. > > Could you tell me where I might find an example of a duplicate check and > transaction lookup in the existing CVS code? It'll help me to see how gnucash > is structured. (There's an awful lot of code to look through if I don't know > where to start). Also, which files in the CVS deal with the XML file I/O? See the code under src/import-export -- you'll find a bunch of transaction matching and duplicate detection work. It would need to get extended to general objects instead of just transactions. But if you do this I think the code could definitely be re-used. The xml i/o is in src/backend/file/*. -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH [EMAIL PROTECTED] PGP key available _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] https://lists.gnucash.org/mailman/listinfo/gnucash-devel