John Redford <[email protected]> writes: ... > I didn't attend, and perhaps someone already made this point, but it > often seems to be lost in Perl and Perl-like communities where > sequential IO is considered the only kind of IO, and memory > allocation is hidden from the developer. > > There are two generally significant "wins" for fixed-record formats. > > 1. Static allocation. Records can be read without reallocating or > wasting memory. This is faster/better for obvious reasons. Many > languages make it possible to map a fixed structure over the memory > used, so inner elements can be accessed conveniently -- and unions > make it possible to access conditional content nicely. > > 2. Dynamic record access. When data is stored on a filesystem, mmap/ > CreateFileMapping can be used to map the record-data and offsets can > index to an arbitrary record directly. > > There are many languages that can take advantage of these factors. > Perl is bad at both. > > Using a format like XML or JSON can be nice in other ways. But it's > going to be less efficient and it's going to require full parsing of > an entire document (barring exceptional exit of a streaming parse) > and dynamic allocation of memory. > > There is no "better" choice -- only "better for the situation". (But > if you define the "situation" as "I want to use Perl", then you're > starting with a narrow viewpoint.)
Maybe I should point out that Uri's wording of this preference expressed no great love for XML. Also, this fixed length format he dealt with, and the ones where I am though not as bad I think, was strange in ways that went way beyond just using fixed length records (e.g. embedded back references to other records). I'm rather less "in the Perl community" than I would like to be, and the preference for XML over fixed length records here came from C++ programmers. Not disagreeing with your points (I'll leave to others whether Perl is inherently unsuited to imitate COBOL) on fixed record file advantages, only for these people XML had a strong draw, even with potentially large amounts of data, because it eased debugging bad files, was somewhat more self-documenting, and had ways of doing some basic validation, without extra code, using DTDs then schemas (okay, also in those days "doing XML" was some kind of weird marketing position). It's a royal pain to be given a bad file and have to pull up some excel spreadsheet, pick out the 17th tab, in the 5th subgrouping, in the 47th column to read off, "Invoice amount, offset 137, length 9," and then type ctrl-u 137 ctrl-f in emacs to get to the right place and then mark off your region 9 places so the number does blur in with the next one. But yeah, it's not small and programs reading it aren't so quick. > As for "standards"... There's no such thing. Performance is real -- > standards are just agendas. If you want to talk to X, then you'll > speak what X speaks. Period. You can tell X that "French is the > language of diplomacy" or "XML is the language of the future", but > when X ignores your XML & French, you can either speak what X wants, > or not talk to X. Standards are just what the powerful call their > own preferences. I'm just wondering which powerful people had it out in these standards in this particular case (my case not Uri's, his fixed record file seemed to have a standard). Cause if it was the powerful people who buy the software needing to read the EDI providers' output files, they should have standardized that instead of the EDI providers' input (i.e. whatever flows across these proprietary networks). It would have saved a lot of work. Then again, I work with someone whose whole job is dealing with these things, and she probably enjoys getting paid, so... -- Mike Small [email protected] _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

