Alexey, cool feature indeed!
Some comments: * I could not figure out what the quality criteria is in you current scripts :) Is it just the number of certain words? What are they? (I could not grok your script at the first glance, sorry) * Old style of license header in check_doc_quality.sh * check_doc_quality.sh is hardly readable. You use autogenerated perl files from a shell script. I think, the whole perl script would be much better readable and supportable. You also use uniq, sort, awk, etc. All these can be done in perl naturally. * source file quality estimate would also be good (so I'd be happy to doxygise more and more files) * feature request: output files edible by harmonytest.org? so the wole doc quality metric can be tracked * just a suggestion: let's choose the best documented file and make it a reference, so we could document sources in a similar style. It should also help people, who did not use Doxygen in their life (like me:) to survive in this doc hunt. On the 0x216 day of Apache Harmony Alexei Fedotov wrote: > Nadya, > > You asked good questions. Here are few answers: > > 1. Grouping of results is implied by documentation grouping. Scripts > can process any documentation bundle, so if one creates a smaller or > specific bundle, the list will be shorter, or more specific.Creating > several documentation bundles in different directories makes their > comparison an easy task - I can do this comparison. > > 2. Personally I like @page tags and package.html files. I appreciate > Salikh's efforts of creating wonderful technical descriptions - I > referred to them as masterpieces. I also remember that you asked me to > create a narrative section for a component manager few months ago. All > Doxygen documentation will be on the web site. Why these narrative > sections shouldn't be evaluated? > > 3. I don't think the rating of pages such as a list of functions > should be neglected. Any .html page which can be viewed by a user > should be readable. That is a reason why I parse .html files in the > script, not sources. > > 4. I believe establishing connection between .html files and source > files can be automated. I don't know how to write a short script for > that, because sometimes .html page depends from several source files, > and vice versa. > > 5. You can imagine the following pie chart from the data: 2680 pages > of 2922 (91%) are not good and should be revised. > > 6. Today the community discussed removing GC V4. This would > immediately decrease GC documentation size. It would also decrease a > number of well documented pages on garbage collection, since new GCs > don't have as much comments in their code as old GC V4. > > Thank you for nice catches, > Alexei > > > > On 11/2/06, Morozova, Nadezhda <[EMAIL PROTECTED]> wrote: > > Wow! Alexei, great start! > > ... and so many pages marked with 0 rank. I really appreciate your > > effort - it sets me back on earth and to work. I hope this rating would > > also make owners of code more ambitious, and would inspire them to write > > more/better comments to get a better rating :) > > > > Question on output measurement: can we rate source files or code > > elements (structure, functions, etc) instead of html files? > > My concerns: > > - Many html files are autogenerated, their rating is N/A. examples: > > todo.html, functions_vars_0x68.html (listing of links to functions in > > alphabetical order - there are many pages like that) > > - Some results are duplicated, because each struct/function has an > > individual rating + rating of the file/group reference it belongs to. > > - Some files have a high rating (see the top candidate, for example), > > but it's generated from comments marked with @page. These don't belong > > to specific code, but create a narrative section. Evaluating these is > > complex, and perhaps, should not be done. My personal preference would > > be to move such generic explanations to component docs on the website > > and reserve Doxygen docs to API reference as much as possible (this is a > > subject for further discussion). > > - the listing of files is SO LONG... grouping them by > > file/component/type or otherwise organizing the output would make the > > whole rating more readable. I mean, from the current version, I can only > > make out the leaders (not files even, individual bits of them), and > > understand that the majority have 0 rating. This has its instructional > > impact, but I cannot see the areas where we are the best - bearable - > > worst, or see the approx distribution of powers... missing that info > > leaves me without direction on what to do. > > > > Question on data presentation: do you think we can have some post > > processing of the raw data that your script produces - to see the big > > picture? We have some metrics: graphics, pie charts, anything. This > > would instantly show the most important conclusions. I could do such > > metrics and post them regularly on Wiki. If anybody says they need such > > kind of info, I'd volunteer to help. > > > > Thank you, > > Nadya Morozova > > > > > > -----Original Message----- > > From: Fedotov, Alexei A [mailto:[EMAIL PROTECTED] > > Sent: Thursday, November 02, 2006 11:33 PM > > To: harmony-dev@incubator.apache.org > > Subject: [doc] What should be improved in DRLVM Doxygen documentation? > > > > Nadya, All, > > I have ranked the quality of Doxygen-generated DRLVM documentation and > > posted it to the following Wiki page: > > > > http://wiki.apache.org/harmony/DRLVM_Documentation_Quality > > > > All are welcome to check masterpieces of our documentation. All > > volunteers are welcome to improve page ranks by editing comments in > > DRLVM sources. > > > > With best regards, > > Alexei Fedotov, > > Intel Java & XML Engineering > > > -- Egor Pasko