I have long been frustrated by BBEdit's refusal to edit files that, 
for good reason, have different line end characters throughout - like 
BBEdit Worksheet files for instance.

Now I have another similar problem. Adobe has changed the format of 
its .fdf, form data, files. For almost a decade I have been filling 
out  US tax forms by creating fdf files with Excel macros. I could 
download the PDF blanks from the USIRS web site and Acrobat would 
politely load the data from the fdf files.

The new format mixes not line ends but UTF16 and ASCII encoding in 
the same file! Needless to say, BBEdit doesn't handle it well.

Here's what the first few lines looks like when opened with BBEdit

%FDF-1.2
%’“¦"
1 0 obj
<<
/FDF
<<
/Fields [
<<
/V (œ  l i n e   1 4)
/T (œ  f 1 _ 0 5 8 \( 0 \))
>>
<<
/V (œ  l i n e   1 5)
/T (œ  f 1 _ 0 6 0 \( 0 \))
>>
<<
/V (œ  l i n e   6)
/T (œ  f 1 _ 0 4 2 \( 0 \))

Here's the hexdump of the same first few lines produced by BBEdit

0000: 25 46 44 46 2D 31 2E 32 0A 25 E2 E3 CF D3 0A 31   %FDF-1.2.%’“¦".1
0010: 20 30 20 6F 62 6A 20 0A 3C 3C 0A 2F 46 44 46 20    0 obj .<<./FDF
0020: 0A 3C 3C 0A 2F 46 69 65 6C 64 73 20 5B 0A 3C 3C   .<<./Fields [.<<
0030: 0A 2F 56 20 28 FE FF 00 6C 00 69 00 6E 00 65 00   ./V (œ .l.i.n.e.
0040: 20 00 31 00 34 29 0A 2F 54 20 28 FE FF 00 66 00    .1.4)./T (œ .f.
0050: 31 00 5F 00 30 00 35 00 38 00 5C 28 00 30 00 5C   1._.0.5.8.\(.0.\
0060: 29 29 0A 3E 3E 20 0A 3C 3C 0A 2F 56 20 28 FE FF   )).>> .<<./V (œ 
0070: 00 6C 00 69 00 6E 00 65 00 20 00 31 00 35 29 0A   .l.i.n.e. .1.5).
0080: 2F 54 20 28 FE FF 00 66 00 31 00 5F 00 30 00 36   /T (œ .f.1._.0.6
0090: 00 30 00 5C 28 00 30 00 5C 29 29 0A 3E 3E 20 0A   .0.\(.0.\)).>> .
00A0: 3C 3C 0A 2F 56 20 28 FE FF 00 6C 00 69 00 6E 00   <<./V (œ .l.i.n.

It appears that the parentheses that are not escaped designate blocks 
that are encoded as UTF16. They begin with an FEFF code point which 
is surely a byte order mark. After that there are 16 bit entries the 
first byte of which is a null for every file I have looked at.  The 
escaped parentheses are there because the author of the PDF used 
parentheses in his definitions of the form names. Note though that 
the backslash escape character is preceded by a null but the 
parenthesis following it is not.

So my question is. . .  Is there any way I can make use of BBEdit to 
post process the plain ASCII files produced by my Excel macros and 
create a version with the mixed ASCII and UTF16? An AppleScript would 
be an easy way to go and I care not a whit about speed. Can I tell 
BBEdit to change from U16 to ASCII and back as it writes a file? 
BBEdit uses U16 internally for everything. When it reads my fdf file 
does it convert 0066 ( an f ) to 0000 0066? or does it leave the 16 
bits alone by effectively ignoring the null character in the file?

I have looked at reworking my VBA code and that will be a pain. There 
seems to be no way to handle nulls inside of a worksheet cell. Perl 
will probably handle the task and I have started that but perl's "use 
unicode" options are not helpful. There is also UNIX sed which might 
work with a bunch of successive substitutions.  Any other ideas? 
This is a once a year project and I really don't want to use C for it.

-- 
-> Stocks are getting pelloreid <-

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "BBEdit Talk" group.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a specific feature request or would like to report a suspected (or 
confirmed) problem with the software, please email to "supp...@barebones.com" 
rather than posting to the group.
-~----------~----~----~----~------~----~------~--~---

Reply via email to