>>>>> "AB" == Alex Brelsfoard <[EMAIL PROTECTED]> writes:
AB> We are reading in feeds (typically TSV or CSV files), parsing AB> them, reformatting the data, and spitting it out as another file. AB> These feeds can come from all sorts of people/places/things. So AB> sometimes they are wonderfully formatted and we understand their AB> content. Sometime they are not, and we do not. AB> Their CMS may just enclose this field in quotes and move on. if the field is in quotes, then i think a CSV module can parse that for you regardless of embedded line breaks. check them out on cpan and see what they offer. AB> So now we have a feed where the description column may have AB> linebreaks in it. So I can't just split on any form of linebreak. so you need to come up with a better spec for a 'row' and embedded line breaks. again, showing some real data will help. note that we need to see the line break problem and not have your mailer line wrap also which would make it impossible to figure out. AB> Does this make a bit more sense? sorta. but concrete examples are needed. there may be clues as to how to parse these lines such as embedded line breaks happen around certain types of chars, etc. if they are truly random then you may be in deep doodoo of heuristics. i was there once also when doing some parsing of various data feeds. no way to get it 100% correct. AB> btw, there's no chance that I could define $/ as a regex could I? file::slurp supports a regex for the line separator. set $/ to a regex (localized) before a call to read_file in list context and you should get lines as you want. uri -- Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.sysarch.com -- ----- Perl Architecture, Development, Training, Support, Code Review ------ ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org --------- --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com --------- _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

