>> What happened to: >> <span class="money"><abbr class="currency" title="USD">$</abbr><span >> class="amount">5.99</span></span>
I brought up the issue of the markup being large and complex to implement, and so we were discussing suggestions about how to potential streamline the markup. -Mike -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stephen Paul Weber Sent: Wednesday, October 18, 2006 7:55 PM To: Microformats Discuss Subject: Re: title attribute and abbreviatedclassnames(Was:[uf-discuss]Currency Quickpoll: Preliminary results) On 10/18/06, Mike Schinkel <[EMAIL PROTECTED]> wrote: > >> <span class="money" title="USD">$5.99</span> I still think this is > >> bad semantics. I don't think "USD" is really a title for "$5.99". > > I'll accept that. > > >> I'd propose this as an alternative: > >> <abbr class="currency" title="USD">$</abbr>5.99 What happened to: <span class="money"><abbr class="currency" title="USD">$</abbr><span class="amount">5.99</span></span> Does that solve the whole problem and give us an extra usefulness at the same time (sorry for leaving a discussion and then just jumping back in again. Ignore me if I make no sense.) > > Okay... But is it a good idea to have a microformat as a prefix/suffix > instead of as a container? (general question - I hope it hasn't been > answered before...) > > If so, you'll also need (note the space after 35.66): > > 35.66 <abbr class="currency" title="DKK">kr</abbr> > > However, at the risk of being shot for heresy, has anyone considered allowing > this? > > <abbr class="currency usd">$5.99</abbr> > <abbr class="currency dkk">35.66 kr</abbr> > > OR (something tells me this is even worse, but...): > > <abbr class="money currency-usd">$5.99</abbr> > <abbr class="money currency-dkk">35.66 kr</abbr> > > I'm sure there is something just so wrong about this, but part of the reason > I'm on this list is to learn. So why not? > Additionally, that would allow: > > <abbr class="currency usd" title="5.99">Five Dollars and 99 > cents</abbr> > <abbr class="currency dkk" title="35.66">Thirty Five point 66 > Kroners</abbr> > > OR (for orthogonality): > > <abbr class="money currency-usd" title="5.99">Five Dollars and 99 > cents</abbr> > <abbr class="money currency-dkk" title="35.66">Thirty Five > point 66 Kroners</abbr> > > Just a thought...? > > -Mike > P.S. *** I wish HTML had allowed "rel" for all tags including <span> > and <abbr>. Or that we could just use it anyway and not get shot for > heresy. :) > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Scott Reynen > Sent: Tuesday, October 17, 2006 10:30 AM > To: Microformats Discuss > Subject: Re: title attribute and abbreviated > classnames(Was:[uf-discuss]Currency Quickpoll: Preliminary results) > > I've starting replying to this a few times and become stuck in trying to fit > what I'm trying to say in the existing thread, so I'm just going to make some > points completely detached from the thread. > > First, I think Mike is right that the vast majority of published money > formats allow parsers to infer the distinction between the currency symbol > and the amount. But this inference is already possible without a > microformat. What's missing currently is: > > 1) an indication of which specific currency the symbol refers to. > 2) the ability to markup money that doesn't fit this pattern > > I think it's best to either cover #1 or both, but I think it's too > complicated for publishers to provide what amounts to two distinct > microformats depending on a relatively complex pattern definition. > That is, if we're going simple (only #1), I think we should go only simple, > and add the complex form to cover #2 later. > > So to cover #1, Mike has suggested: > > <span class="money" title="USD">$5.99</span> > > I still think this is bad semantics. I don't think "USD" is really a title > for "$5.99". I'd propose this as an alternative: > > <abbr class="currency" title="USD">$</abbr>5.99 > > That is, markup the currency as currency, and treat any adjacent numbers as > the amount. > > To cover #2, I think we need an additional class="money" container, and a > class="amount" markup for the amount, and this could be added without > changing the parsing rules for the simple form I've suggested above. I think > it would be best to start with either simple or complex and look at adding > the alternative after the microformat has gained some adoption. > > I don't think regular expressions should be included in the spec at all. If > we're going to define amounts based on character ranges, we should describe > those character ranges in plain English because most people, even most tech > geeks, don't understand regular expressions at all. > > Peace, > Scott > > On Oct 15, 2006, at 4:40 PM, Mike Schinkel wrote: > > > Scott: > > > > Thanks for the reply. If probably got confusing on my part; I will > > try to resolve that here if possible. > > > >>> I thought what you suggested was to allow for explicit > >>> differentiation between the currency identifier and the amount, > >>> but in certain cases where such differentiation can be made by > >>> matching a regular expression, allow for markup without explicit > >>> differentiation, leaving the differentiation implicitly to the > >>> parser to figure out. For example, this would be valid:... > >>> because it does follow the pattern, where everything that's not > >>> within a certain character group is considered a currency symbol > >>> (i.e. "$"). If this isn't what you're suggesting, then I'm not > >>> clear on what you're suggesting. > > > > You got it 100%. But I did make a mistake in my example as I didn't > > mean to include alpha [A-Za-z]. It should just have been digits, > > periods, and commas [0-9\.\,]; everything else would be the currency > > symbol. I wasn't explicit about the following, but I will be now; no > > spaces (or ) and the currency figure must be contiguous and > > either prefix or suffix a collection of digits. > > Anythings else, and you need the complexity. > > > > Although I am not good with regex, I opened my regex book and my > > regex test and crafted this regex which I think identifies 100% of > > the special case to which I referred: > > > > ^([^0-9,\. ]*)([0-9]+[\.,]?[0-9]*)([^0-9,\. ]*)$ > > > > In that regex, if $2 has a value, that's the amount. If $1 OR $3 > > has a value, then it's the symbol. If it doesn't match, you *must* > > use the complex form. (btw, this would also be really easy to write > > a recursive descent and/or a looping parser in javascript or other > > languages to parse this and we could publish those reference > > implementations.) We publish the regex (or a better written one) > > and the recursive descent parsers and say if it matches, you can use > > the simple form, otherwise the complex > > > > So the following could use the simple form: > > > > The book is <span class="money" title="USD">$5.99</span>. > > In Brazil, the book would be <span class="money" title="BRL">R > > $12.84</span>. > > In Denmark, the price would be <span class="money" > > title="DKK">35.66kr</span>. > > > > BTW, it wouldn't be hard to include spaces in the regex and it might > > be a good idea to go ahead and do that. If so, you can use the > > javascript replace() string function (or similar in other > > languages) to first normalize the string to containing only real > > spaces and no like so: > > > > s.replace(/ /," ") > > > > where "s" is the innertext for the <span> and then use this regex on > > the result: > > > > ^([^0-9,\. ]*)[ ]?([0-9]+[\.,]?[0-9]*)[ ]?([^0-9,\. ]*)$ > > > > Where again $1 OR $3 will be the symbol and $2 will be the amount. > > That would make these possible. > > > > The book is <span class="money" title="USD">$ 5.99</span>. > > In Brazil, the book would be <span class="money" > > title="BRL">R$ 12.84</span>. > > In Denmark, the price would be <span class="money" > > title="DKK">35.66 kr</span>. > > > > Yes is it a little more difficult for the person writing the parser, > > but there will be many times more orders of magnitude people writing > > the HTML than parsers and besides, we can provide a working regex > > and reference implementation functions that will be good for 99% of > > cases and just say "Here; use it!" > > > >>> http://regexlib.com/Search.aspx?k=currency > > > > I reviewed that and it appears there are most regex submitted that > > do essentially the same thing, correcting for something others > > didn't do (like handle leading zeros); did I misread? > > > >>> and I think it's only helping a slight majority that is quickly > >>> becoming a minority. English language web pages only comprise > >>> about 55% of the web today, and that percent is quickly shrinking. > >>> So I'm publishing my currency in English, and you're trying to > >>> ease my implementation burden, so I don't have to explicitly > >>> define my currency symbol and parsers will just figure it out for me. > > > > I respectfully think it won't be in the minority; I think it will be > > the vast majority. And it will work in others language besides > > English such as German, Spanish, French, Porteguese, Russia, Arabic, > > and so on; any that use digits + periods/commas for representing > > numbers. It seems the only languages in any significant use that it > > doesn't work for is multibyte characters, which will require the > > complexity because, frankly, they are complex. > > > >>> I think this is already more confusing than just always > >>> identifying the individual parts, I think it's still likely to > >>> cause problems, .. > > > > Requiring identification of individual parts is less confusing in an > > abstract manner because you don't assume anything, but it is more > > difficult to learn because it requires everyone that implements it > > grok the entire spec to be able to use it. By offering a simpler > > version, (I assert that) most people won't have to learn all the of > > the details because they will just use the simple version. So it > > could be described as such: > > > > The Money microformat has a simple version that applies in > > most cases, and a complex > > version for when you really need control or if you are using > > multibyte character sets. You > > can use the simple version, if the markup to which you want to > > add this microformat is > > limited to: > > 1.) currency symbols (i.e. $, £, etc.), > > 2.) spaces, > > 3.) digits (i.e. 0-9), and > > 3.) decimal seperators (comma "," or period ".") > > > > For example: > > > > The book is <span class="money" > > title="USD">$ 5.99</span>. > > In Brazil, the book would be <span class="money" > > title="BRL">R$ 12.84</span>. > > In Denmark, the price would be <span class="money" > > title="DKK">35.66 kr</span>. > > > > If however you want to markup money represented in much more > > complex ways, you'll need to > > use the more complex version, for example: > > > > <p class="money">It'll cost you <abbr class="money" > > title="50.00">fifty</abbr> > > <abbr class="amount" title="GBP">quid</abbr>, > > mate!</p> > > > > <span class="money">Can you spare <abbr class="amount" > > title="10">ten</abbr> > > <abbr class="currency" title="USD"><span > > class="unit">dollars</ > > span></abbr>?</span> > > > > By describing it this way, people who can use the simple version are > > never even required to drill down and learn the complex way. > > This seems infinitely easier for the vast majority of people than > > for them to have to grok the entire spec right off the bat. > > Frankly, when I first saw it I thought "It isn't really going to be > > this complex, is it? I though the theme behind microformats were > > "Make the simpliest addition to HTML markup required." That's one of > > the reasons I was so drawn to the initiative. > > > > I actually think you'll end up with more invalid microformats if > > people are required to implement the current proposal because it is > > complex enough that it would be relatively easy for someone to get > > wrong. By having a simplier format, you'll minimize the chance those > > people get it wrong, and that those who do go to the more complex > > are more likely to really study it and get it write, and there will > > be less people overloading the experts by asking less questions > > about it (IMO). > > > > Question: Maybe we should vet this with typical web developers who > > are NOT involved with the microformat's initiative? We could go out > > and ask workaday web site developers and web site maintainers their > > opinion on the subject of what is easier to comprehend? > > Honestly, I'm giving my opinion but I could find out my opinion is > > in a tiny minority. Or vice versa. > > > > BTW, is there a plan to create a series of microformat validator > > pages where someone could go and enter a URL and have it extract all > > the data it found for a given microformat? Without this, I think > > people will end up creating lots of pages with invalid microformat. > > And it would need to be done for *each* microformat. > > > >>> There are people from Yahoo! on this list, and Technorati's pretty > >>> big too, so they'd be good people to say whether or not they > >>> really care how long the class names are. > > Yeah, I already said "Okay, concern addressed" in an earlier reply. > > > > Anyway, I'm hoping that my earlier mistake of including [A-Za-z] was > > the main reason you objected and that you'll agree with a small > > scope minimum form like I'm proposing. > > > > -Mike Schinkel > > http://www.mikeschinkel.com/blog > > http://www.welldesignedurls.org/ > > > > P.S. On another note, another question just occurred to me: why are > > you using "money" and not "hMoney?" > > > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > > Scott Reynen > > Sent: Saturday, October 14, 2006 10:39 PM > > To: Microformats Discuss > > Subject: Re: title attribute and abbreviated class names(Was:[uf- > > discuss]Currency Quickpoll: Preliminary results) > > > > On Oct 14, 2006, at 3:27 PM, Mike Schinkel wrote: > > > >>>> Your examples seem to leave a lot of ambiguity about what things > >>>> mean, > >> > >> I'm new to proposing microformats, so I clearly have a lot to > >> learn, but that said I don't see where what I was proposing was ambiguous. > >> Can you give me explicit examples where allowing default > >> assumptions for the most common use cases will by necessity lead to > >> ambiguity? It seems to me that either something will be specified > >> or if not it will default? That seems non ambiguous to me. Am I > >> wrong? > > > > I'm not entirely sure we're talking about the same thing anymore, > > after reading this exchange: > > > > On Oct 14, 2006, at 3:55 PM, Mike Schinkel wrote: > > > >>>> That said, why not make the "symbol" markup optional? > >> > >> That's IMO is an additional good idea. > > > > I thought that was basically what you were advocating, but you > > called it an /additional/ good idea, so I'm not sure what it's an > > addition to. I thought what you suggested was to allow for explicit > > differentiation between the currency identifier and the amount, but > > in certain cases where such differentiation can be made by matching > > a regular expression, allow for markup without explicit > > differentiation, leaving the differentiation implicitly to the > > parser to figure out. For example, this would be valid: > > > > 本が<span class="money"><abbr class="amount" title="1000">一千</ > > abbr><abbr class="currency" title="JPY">円</abbr></span> > > > > because it doesn't fit the pattern you suggested, but this would > > also be valid: > > > > The book is <span class="money">$5.99</span>. > > > > because it does follow the pattern, where everything that's not > > within a certain character group is considered a currency symbol > > (i.e. "$"). If this isn't what you're suggesting, then I'm not > > clear on what you're suggesting. > > > > But if this is what you're suggesting, I think you're > > underestimating the complexity involved in defining which characters > > might be part of an amount and which characters might be part of a > > currency symbol. I do a lot of parsing via regular expressions and > > a large part of my interest in microformats comes from witnessing > > the failure rate in such parsing. There's always another unexpected > > format popping up and before you know it, the regular expression is > > a page long. See this page for a list of regular expressions for > > identifying the information that needs to be parsed from currency > > values for a quick > > taste: > > > > http://regexlib.com/Search.aspx?k=currency > > > > And those are all defining legitimate input much more strictly than > > would be appropriate for the web at large. > > > > To specifically answer your question of what doesn't work with [A- > > Za- z0-9], there's the decimal point, which is part of the amount > > rather than the currency symbol, and there's any commas, which are > > also part of the amount rather than the currency symbol, and any > > whitespace characters (of which there are many) shouldn't be > > considered part of the amount nor the currency symbol. That's all I > > can think of right now, but I have no doubt there's much more I > > haven't thought of, and it's that much more I'm worried about. So > > if we come up with a definition that includes all of that, now we're > > talking about explaining to authors that they can only leave out the > > currency markup if their class="money" tag is only containing > > letters, numbers, decimal points, commas, and whitespace. Otherwise > > they have to explicitly identify the individual parts. > > > > I think this is already more confusing than just always identifying > > the individual parts, I think it's still likely to cause problems, > > and I think it's only helping a slight majority that is quickly > > becoming a minority. English language web pages only comprise about > > 55% of the web today, and that percent is quickly shrinking. > > So I'm publishing my currency in English, and you're trying to ease > > my implementation burden, so I don't have to explicitly define my > > currency symbol and parsers will just figure it out for me. What if > > I want my whitespace to be marked up with HTML entities? E.g.: > > > > The book costs <span class="money">$ 5.99</span> > > > > That's not an unlikely scenario. I actually publish currency values > > like that, when someone wants a space to separate the $ from the > > amount, but they don't want the two getting split onto separate > > lines. Are we going to include that in the regular expression too > > or do I need to explicitly identify my symbol? If it's not allowed, > > how will that be explained clearly enough that I won't make this > > mistake and wind up with my currency symbol wrongly interpreted as > > "$ ", which doesn't map to any known currency, and will lose my > > space if it's replaced by another currency symbol? This is the kind > > of ambiguity that doesn't really help publishers. And if it is in > > the regular expression, how are we going to explain to publishers > > that it's okay? Looks like unnecessary complication to me. > > > >> But one final point on this; has this been discussed this with > >> those who make the decisions for markup used at the largest sites: > >> Google, eBay, > >> Amazon, etc.? Just curious? (and I don't mean to push this, it's > >> just that being pedantic is in my nature, unfortunately. :) > > > > There are people from Yahoo! on this list, and Technorati's pretty > > big too, so they'd be good people to say whether or not they really > > care how long the class names are. > > > > Peace, > > Scott > > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > > _______________________________________________ > microformats-discuss mailing list > microformats-discuss@microformats.org > http://microformats.org/mailman/listinfo/microformats-discuss > -- - Stephen Paul Weber, Amateur Writer <http://www.awriterz.org> MSN/GTalk/Jabber: [EMAIL PROTECTED] ICQ/AIM: 103332966 NSA: [EMAIL PROTECTED] BLOG: http://singpolyma-tech.blogspot.com/ _______________________________________________ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss