Hi there, I'm having a problem with some code I've written. The gist of my program (whose code I unfortunately cannot share at this time, I'll have to get approval first) is this:
1. Read elements from an XML file and turn them into objects. 2. Place these objects into hash tables. 3. Do some stuff with those objects. 4. Print them out to various files (depending on the type of the object). My trouble is this: if I fail to call flush on the output channels in step #4, I get mangled output. By mangled I mean that in the middle of one line, suddenly the data from another line appears. The other line exists elsewhere in the output. Sometimes lines are simply duplicated. I found this highly strange and thought the problem was in my code at first. But I couldn't find anything, so I decided to make a call to flush after every line I wrote to the output. Suddenly my problem disappeared! My understanding, however, is that flush shouldn't be required to do this correctly. After all, I simply open the output channel, write to it a bunch, and then finally close it. I haven't yet been able to come up with a simple case that exhibits the problem. I can't share the code with you yet, and I can't share the data either, so I'll try to give as much information as I can. A. I only call open_out, output_string, output_char, and close_out. B. Although I link to Batteries (version 1.4.1) I don't use its IO layer. I just call the functions that I need directly (e.g. BatString.join) C. There are two files that exhibit the problem. D. The problems in the output file occur in exactly the same position every time, even if the data itself changes! D1. In one file, it's position 2883585. At that location, it duplicates text from position 794139. venatc01 01 Clinton William [email protected] 1234567 J 1600 Pennsylvania Ave Washington DC 12345 US 1 Y This is a sanitized example of what the output looks like. It's supposed to be the information for user venatc01, but suddenly in the middle of the line the information for a certain Bill Clinton is injected. The row describing Bill Clinton appears earlier in the file. This particular file is quite long, and there are several duplicate lines: position 2610356 is duplicated at 2883693, position 2435496 is duplicated at 2883819. D2. In the other file, it's position 20481. At that location, it duplicates text from either position 6434 or 10494. (You can't tell because it's the same data in both spots.) line 232: 11667.201210 venatc01 S Y Y line 378: 14900.201210 venatc01 S Y Y line 737: 1241210 venatc01 S Y Y Everything after the 124 above is copied either from line 232 or 378. E. Flushing the output of one file after every line printed fixes that one file, but does not affect the position of the problem in the other file, which remains the same. F. Adjusting the heap size using OCAMLRUNPARAM=s=4M,i=32M,o=150 had no effect. G. The problem exists both with byte compilation and native compilation. H. I'm using OCaml 3.12.1 on Linux x86_64. I've assumed that you don't need to call flush periodically to avoid problems like this, but maybe that's not the case? Should one expect any problems or difficulties if one doesn't explicitly flush every so often? If anybody has any ideas on how to debug this, I will be greatly appreciative. I don't know that much about OCaml internals and how to debug things like this. If I can provide some more information, let me know. If it will help to have the code, I'll speak with my boss. In the mean time, I'll keep trying to reproduce with a much simpler program. Thanks for any thoughts. -- Taylor C. Venable http://metasyntax.net/ -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
