Hello all,

replying to both of Giulio's messages from today.

> One questio, Giulio: in et.c, in what is now line 523, why did you add
`c` as an unsigned long? Is there some particular reason not use size_t?

>
> Portability when you want to use *scanf and *printf functions.
> size_t is machine dependent and related to memory management. It is large
> enough to contain any memory address and should be used when referencing
> memory.
> So it is the right type for malloc, array indexes, elements counting and
> so on. On many modern system it is 64bit and often it is defined as
> unsigned long.
>
> unsigned long is also machine dependendant. It is granted to be at least
> 32bit and, on many modern system is 64bit as well.
>
> In *scanf and *printf unsigned long is the largest unsigned value that can
> be used portably (including C89-only compilers). And this is why I used it
> there.
>
> Anyway, we are already using C99 flags and the z modifier is available in
> C99... So this is maybe a non-issue at all.
> However, while I am not strict at all about this, I prefer to have
> C89-compatible code if possible.
>

Oh yes, this is an issue with the code. Ingo mixed size_t, ints and longs
somewhat freely, probably because, being a tool for writing a thesis in the
'90, he did not had worry too much about different systems. In fact, this
was by far the hardest part to fix to make acopost compile with newer
versions of gcc, and I was almost lost when I tried to make it work with
64-bit system when I finally got my hands in one -- Ulrik was the one who
fixed pretty much everything.

Not that this is is really a concern, but in that specific line of code we
are comparing an unsigned long (guaranteed to be at least 32 bits) with an
int (at least 16). The real issue is that the `tagcount` variables in et.c
would better be size_t or simply unsigned, we can work on that later (it's
better to get it working, with testing, before meddling any further, in my
opinion).

[other message]

> I just had a quick look at the changes.
> I think most of them was fine.
> I would avoid leaving too many comments in the code anyway, as comments
require maintainance in order to be useful.
>So I dropped comments that were not very useful.
>In particular, with reference to primes.c, the comment was useful for me
to accept the change without asking, but is not useful in general. BTW: did
you really found a
compiler that launched a warning in that case (it was a false positive)?
The comments about casting are redundant because they just state what is
done immediately after, so they are not useful as well.

Other less obvious comments (e.g. the comment about alpha and c in met.c),
such as those that explains blocks of code or that reminds something that
will happen later, are
very useful to the reader.

With respect to commits comments, it is usually considered best practice to
have them short in the first line and leave longer explanations in later
lines (you can use more
than one line). In this way git log can fit standard terminals width and is
not hard to read.

===

Yes, it was giving a false positive in clang-3.5, under Ubuntu. When I get
some C code to study/fix, I usually run it in clang first and in gcc later.

Regarding the comments, both in the code and in git, thank you for
instructing me, I wasn't aware that comments could be multiline in the
second. Given that they can be long and are stored in the history, and we
can see who did what and when, most of the comments in the code weren't
really needed. I'll do it right next time. :)


> >     > [scripting and python]
> >
> > Oh, you should have seen some crazy Perl code for tagging and parsing
> Brazilian Portuguese that people used ten/fifteen years ago. Being written
> by linguists who learned
> > Perl and programming in general as they were writing the system, it was
> unreadable even by Perl's standards. :)
>
> I believe you. I used to say that perl is a write only language. ;-)
>

Even worse, the system I am referring to run under MS-DOS/Windows, it
adapted the results of tagging/parsing to the syntax of a certain Varbrul,
yet another DOS program, intended for statistical analysis of
pronunciation/phonology; as piping in DOS was not known or easy, this was
done from a .BAT file. Not something pretty (Varbrul, by the way, is a
permanent problem for me: in 2008, at the university where I studied they
still had a machine with Windows 95 just to run the program, no matter how
much I tried to tell people of quantitative methods in Python or R).


> I agree, the situation is much better now that python 3 is getting used
> more and more. And indeed, if python as to be used, I would suggest python
> 3 should be used.
> Consider that a tool containing python 2 code will not be accepted in
> Debian anymore (and I still have the goal to try to have acopost in Debian
> as well).
>

Lua gets more interesting as the discussion goes on... But, again, there is
no point in discussing it now. Just as a reference:
https://github.com/starwing/luautf8


> It would be nice to have that tool available. Especially if it can provide
> deterministic output that can be used in tests.
>

I won't be able to check my backups until next week; if I don't find it, I
can always rewrite it. I checked the corpus and now have a vague memory of
distributing words according to Zipf's law and building a language model
from the Brown corpus, selecting the chain of tags with a simple weighted
random function.
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
acopost-devel mailing list
acopost-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/acopost-devel

Reply via email to