On Wednesday 2009.01.28, at 01:11 , John Wilson wrote:
> On Wed, Jan 28, 2009 at 5:19 AM, Charles Oliver Nutter
> <[email protected]> wrote:
>>
>> John D. Mitchell wrote:
>>> One metric that is relatively telling in the broader view is the
>>> number of job listings mentioning a given language.
>>>
>>> However, such a direct, first-order measure misses the importance of
>>> the influence of languages on other languages, tools, etc.
>>
>> I think TIOBE uses job metrics as one of their indicators. It seems  
>> like
>> a reasonably good indicator that a language "has been" adopted to  
>> some
>> level, but probably not much of an indicator of languages on their  
>> way
>> to being adopted. For example, we all know Lisp is going to take over
>> the programming language world any day now, and it's only in 23rd  
>> place
>> on TIOBE. Meanwhile, COBOL, which is dying a slow death, is in 17th
>> place. So I'd say job numbers is a lagging indicator at best...

They are all "bad" indicators for a variety of reasons...

> TIOBE say they use search engine metrics using the search term
> "<language> programming" with some language specific post processing
> (see 
> http://www.tiobe.com/index.php/content/paperinfo/tpci/tpci_definition.htm) 
> .
> Once the percentage score for a language falls below 5% I don't think
> the numbers are significant. Their longer term trends look to be more
> valuable and show just how jittery the metric is (I'm sure the actual
> usage of established language does not exhibit this degree of jitter.
> What we are seeing is an artefact of the metric. And what happened in
> 2004! http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html) 
> .

Indeed! Just using raw text search metrics is wildly bad. One nasty  
effect is the low end numbers often get lost from the crawling and  
indexing.

Also note that the search engines don't get behind all of the gated  
areas and so will tend to under-represent some areas and over- 
represent others.

On the search engine front, it's more interesting to use one of the  
specialized search engines such as Krugle to look at things like the  
activity of the projects in the various languages, number of projects  
in a language, number of (active) committers, the ever popular SLOC,  
etc.  [ObDisclosure: I was the Chief Architect.]  But, of course,  
those numbers are hard to normalize across wildly different languages/ 
problem domains/etc.  Also, of course, a public engine like Krugle.org  
is only looking through publicly accessible projects and so that a  
skew away from the proprietary areas.


> "Tim O'Reilly"'s (actually, the latest one I can find is from Mike
> Hendrickson 
> http://radar.oreilly.com/2008/03/state-of-the-computer-book-mar-23.html)
> analysis of the IT book market is based on hard data but, of course,
> is not a direct measure of the usage of languages. There are obvious
> problems with correlating book purchase with language use (Somebody
> who has been writing Fortran for 20 years is probably not going to be
> buying a book on Fortran this quarter -  Very few of the people buying
> Haskel books this quarter will be using the language for serious
> work).

Indeed.

Book sales are much less driven by actual usage as they are by hype.   
So, in that sense they can be useful as a leading indicator but they  
aren't good at all for actual usage.

Also, look at how much of what was historically content for technical  
books that's now freely available on the web.  Reference materials and  
Q&A forums abound.

Another issue is the whole domestic US & western markets vs. the rest  
of the world.  Because of things like cost and translations, the  
international books sales are completely out of touch with the usage  
numbers.

> Job adds are to be treated with some suspicion. Recruiters just love
> keywords, I'm not sure there's a very high correlation between the
> language skills in the ad and the language actually used:)

I think that's a statistically safe correlation (since the recruiters  
are just getting that from the company).

The job posting indicators will over-inflate some numbers (e.g.,  
phantom postings and multiple-postings for a single actual position)  
and under-represent others (e.g., startups, side-projects, etc.).

> I think that any method which uses internet searches is going to be
> pretty unreliable. You are measuring what people are talking about not
> what they are using. For example imagine using that method to measure
> mobile phone usage - I think you'd come to the conclusion that 75% of
> the population use the iPhone.
>
> For languages which are of direct interest to this list I think the
> best metric is mailing list usage (markmail.org is very good for this)

Thanks for mentioning MarkMail!

Mailing lists stats are particularly interesting to me obviously  
[ObDisclosure: Mad Scientist of MarkMail :-)] but be careful with them  
as they have a variety of artifacts, too.  For example some  
communities show tremendous mailing list volume growth that then  
flattens and declines even as the language becomes increasingly  
popular -- one because they moved to a forum based Q&A without a  
gateway with the mailing list.  Also, different communities have  
different cultures and so it's hard to normalize across them -- e.g.,  
some communities have lots of talkers on a regular basis and some are  
more sedate and/or bursty.

Have fun,
John



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to