Sjoerd Op 't Land wrote: > What about (this would be a new feature for MetaCard) multi-char > itemDelimiters? > > set itemDel to quote & comma & quote > repeat for each line m in csvData > put char 2 to -2 of m into tRecord > put item 1 of tRecord into ... > end repeat
That woud work if CSV were consistent. But as implemented in Microsoft products, CVS puts quotes around text values only; numeric values are delimited without quotes, e.g.: "num_hits","page_requested","comments" 2,"/info/index.htm","This is a comment." 5,"index.htm","This is another comment\, which ilustrates how a single field in CVS may also contain escaped quotes and crlf's." An example of the other complication with CSV is the last item of the last row: quoted text elements can contain both returns (crlf on Win) and commas. To distinguish delimiter commas from commas in data, the latter are escaped with "\". I wrote a quickie CSV parser to get the job done (copied below), but I can't help but think there's some clever way to use the matchText or replaceText functions to speed it up (it processes a megabyte in about 2.8 secs on my Mac G4/500MHz/OS 9.2.2, and 4.1 secs on a PII/400MHz/WinME). Suggestions? -- Richard Gaskin Fourth World Media Corporation Developer of WebMerge 2.0: Publish any Database on Any Site ___________________________________________________________ [EMAIL PROTECTED] http://www.FourthWorld.com Tel: 323-225-3717 AIM: FourthWorldInc --====================================================================-- -- -- CSV2Tab -- -- Takes any comma-delimited or tab-delimited tabular data in which -- some values may be contained in quotes and others not, and returns -- a consistently-formatted string with non-data quotes removed and -- in-data returns replaced with a placeholder which can be replaced -- with cr's again later as needed for display (ASCII 11 is borrowed -- from the FileMaker convention for exported data). -- -- After some initial setup, it walks through each char of the data, -- keeping track of whether or not it is inside of a quoted string. -- If so, it collects the data into tmp, and once the closing quote -- is reached it restores escaped quotes and replaces returns before -- copying tmp to the output, tNuData. -- Characters encountered outside of quotes are simply copied to the -- output, with the exception of commas which are converted to tabs. -- function CSV2Tab pData local tNuData -- contains tabbed copy of data local tReturnPlaceholder -- replaces cr in field data to avoid line -- breaks which would be misread as records; -- replaced later during dislay local tEscapedQuotePlaceholder -- used for keeping track of quotes in data local tmp -- collection for data between quotes local tInQuotedText -- flag set while reading data between quotes -- put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify -- distinction between quotes in data and those -- used in delimiters -- -- Normalize line endings: replace crlf with cr in pData -- Win to UNIX replace numtochar(13) with cr in pData -- Mac to UNIX -- -- Put placeholder in escaped quote (non-delimiter) chars: replace ("\""e) with tEscapedQuotePlaceholder in pData -- put empty into tInQuotedText repeat for each char k in pData if (tInQuotedText) then if k = quote then replace cr with tReturnPlaceholder in tmp replace tEscapedQuotePlaceholder with quote in tmp put tmp after tNuData put empty into tInQuotedText else put k after tmp end if else if k = quote then put true into tInQuotedText put empty into tmp else if k = comma then put tab after tNuData else put k after tNuData end if end repeat -- return tNuData end CSV2Tab _______________________________________________ metacard mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/metacard
