Re: [whatwg] summary tag to help avoid redundancy of meta description tag

2010-07-27 Thread Ian Hickson
On Thu, 18 Mar 2010, Roger H�gensen wrote:
 
 On my own site currently I mostly replicate the first paragraph of an 
 article in my journal as the meta description, and write one up for 
 other pages, usually replicating some of the content.
 
 I'm both looking for and want a solution to avoid such redundancy.

The simplest solution is to just not include a description, and rely on 
tools to determine automatically what the most relevant information on the 
page is.


 The perfect solution would be a summary tag, if you look at the 
 journal articles on my site you can imagine the first paragraph being 
 done like this:
 
 psummaryThis is just an example, it's a replacement for the old meta 
 description, and is a brief summary (description) of the page 
 (content)/summary/p
 
 This way the first paragraph in a page would remain unchanged from how 
 it is done today, and a search engine like Google or screen readers etc. 
 would use the summary tag instead of the meta description (which is no 
 longer needed at all in cases like this), if more than one summary tag 
 the first is considered the page summary one, while the others are 
 ignored (but still shown as content obviously).

That, or an attribute, would be a reasonable solution, but I'm not really 
convinced the problem is that important.


On Thu, 18 Mar 2010, Roger H�gensen wrote:
 
 Example using HTML5 microdata: (would this be appropriate, would browser 
 devs, and Google and other search engines support this?)

You _could_ use microdata to do this, but I don't think it's really a 
great use of microdata. This kind of thing would be better done as a 
microformat, e.g. using a well-known class value.


On Fri, 19 Mar 2010, Ashley Sheridan wrote:
 
 Why not just use server-side code to output the first paragraph of 
 content as the description for the page also?

That is indeed another possible solution to avoid hand-authoring 
duplicate content.


On Fri, 19 Mar 2010, Roger H�gensen wrote:
 
 http://lists.w3.org/Archives/Public/public-html/2009Aug/0990.html
 suggests link rel=description href=#desc /, which is ok I guess.
 
 But why not simply allow this instead:
 meta name=description href=#desc /
 
 Existing parsers would notice that content= is missing which is stated 
 as being required, parsers that have been updated would notice there is 
 a href= instead, so search engines could just look for that id in the 
 page. I think this would have the highest success rate.
 
 If backwards compatibility is such a major concern then this could be 
 done: meta name=description content= href=#desc /
 
 I'm unsure what gives the best result for varous parsers though, would 
 empty content make them behave the same as if the meta tag was not there 
 at all? Or would a empty tag cause them to use  as the actual page 
 description?
 
 I'd prefer to have the content attribute missing instead myself, but...

link is the right element for links, meta for text data. Either way, 
though, the right way to address this is to convince implementors (such as 
a search engine developer) that they should follow these links and get the 
description from them. That is an early step in changing the spec:

   
http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F


On Thu, 18 Mar 2010, Roger H�gensen wrote:

 [regarding data-*=] Maybe a better naming would have been: doc-* It's 
 short, it kinda reflect what it's related to as well right? Or does that 
 clash with something?

data-*= is probably too well established to change at this point unless 
there's a really compelling reason.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-20 Thread Roger Hågensen

On 2010-03-19 17:19, Roger Hågensen wrote:

On 2010-03-19 15:43, Ashley Sheridan wrote:

On Fri, 2010-03-19 at 15:43 +0100, Roger Hågensen wrote:

On 2010-03-19 15:17, Ashley Sheridan wrote:
  I just feel that thehead  andbody  areas of a page have two
  distinct uses, and unnecessary crossovers shouldn't occur if it's
  avoidable.

If you look at my other thread Re: [whatwg]meta name=description
href=#desc /
It allows notifying the parser that the content is in the page, and it
is up to the parsers configuration whether to scan beyond the header in
that case. Best of both worlds IMO.

Roger.
 
I did see that, and it looks like a great idea, as it shouldn't 
really break anything, and I saw that it should be possible to use 
for the keywords too, which would fit perfectly with tag cloud 
systems used on a page.


I would presume that this would cause the content parser (browser) to 
strip any and all tags surrounding the marked content?


Thanks,
Ash
http://www.ashleysheridan.co.uk



Well, looking at the example 
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-March/025575.html
I remebeerd that thew title element may have html markup in it (seen 
it in the wild), so most parsers probably apply tag stripping to that 
already,
so yeah, stripping tags the parser do not want shouldn't be an issue 
really.


Just made a feature request article at 
http://wiki.whatwg.org/wiki/Meta_element_href as it's just easier to 
reference that than a mailing list post.
Sorry if it looks messy, I just used the advised template, but it's a 
start at least.

If anyone feel like improving the language feel free to go nuts.

Roger.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-19 Thread Roger Hågensen

On 2010-03-18 10:04, Ashley Sheridan wrote:
The main problem with that would be that parsers would then need to 
read into the body of the page to produce a description of your 
site. This might not produce much of an overhead on a one-off basis, 
but imagine a parser that is grabbing the description from hundreds or 
thousands of pages, then this could become a bit of a problem.


I do not see how that is any more or less of an problem than today with 
pages that have meta description missing,
what do those parsers do then? Do they stop at /head ? What do they 
use as description instead? The first paragraph?
The parsers used by all major search engines certainly do not halt, they 
break down the entire page right?


As for delays, that is not an issue for consumers, I can not recall any 
browser ever showing me the meta description unless I explicitly view 
page properties.
I can imagine that the seeing impaired community would love something 
like this, as it would basically tell screenreaders that this is the 
first paragraph/summary/description/teaser of the page,

allowing blind people to more rapidly jump from page to page.

Currently the meta description is not always good content, would be 
interesting to see a Google analysis of how the meta description is used,
i.e. how many are basically repeating page content (like I do) and how 
many just dump keywords in there, how many pages on a site have a site 
wide identical description? And so on.


Roger.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-19 Thread Ashley Sheridan
On Fri, 2010-03-19 at 13:43 +0100, Roger Hågensen wrote:

 On 2010-03-18 10:04, Ashley Sheridan wrote: 
 
  The main problem with that would be that parsers would then need to
  read into the body of the page to produce a description of your
  site. This might not produce much of an overhead on a one-off basis,
  but imagine a parser that is grabbing the description from hundreds
  or thousands of pages, then this could become a bit of a problem.
 
 
 I do not see how that is any more or less of an problem than today
 with pages that have meta description missing,
 what do those parsers do then? Do they stop at /head ? What do they
 use as description instead? The first paragraph?
 The parsers used by all major search engines certainly do not halt,
 they break down the entire page right?
 
 As for delays, that is not an issue for consumers, I can not recall
 any browser ever showing me the meta description unless I explicitly
 view page properties.
 I can imagine that the seeing impaired community would love something
 like this, as it would basically tell screenreaders that this is the
 first paragraph/summary/description/teaser of the page,
 allowing blind people to more rapidly jump from page to page.
 
 Currently the meta description is not always good content, would be
 interesting to see a Google analysis of how the meta description is
 used,
 i.e. how many are basically repeating page content (like I do) and how
 many just dump keywords in there, how many pages on a site have a site
 wide identical description? And so on.
 
 Roger.
 
 -- 
 Roger Rescator Hågensen.
 Freelancer - http://EmSai.net/


Search engines and people are not the only content parsers. Sure, you
would expect a parser to maybe look further into the content if the
description meta tag was missing, but imagine if a parser had to do this
for all the content it looked at? There are still overheads to consider.

Why not just use server-side code to output the first paragraph of
content as the description for the page also?

I just feel that the head and body areas of a page have two distinct
uses, and unnecessary crossovers shouldn't occur if it's avoidable.

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-19 Thread Roger Hågensen

On 2010-03-19 15:17, Ashley Sheridan wrote:
Search engines and people are not the only content parsers. Sure, you 
would expect a parser to maybe look further into the content if the 
description meta tag was missing, but imagine if a parser had to do 
this for all the content it looked at? There are still overheads to 
consider.


Why not just use server-side code to output the first paragraph of 
content as the description for the page also?


I just feel that the head and body areas of a page have two 
distinct uses, and unnecessary crossovers shouldn't occur if it's 
avoidable.


True, but there is also such a thing as uneeded redundancy, sure 
repeating the same info in the meta tags which is also in the document 
may not add that many KB,
but with increasing number of page requesters that really pile up the 
bandwidth total. Something both users and hosters and ISPs should have 
an interest in right?
If you look at my other thread Re: [whatwg] meta name=description 
href=#desc /
It allows notifying the parser that the content is in the page, and it 
is up to the parsers configuration whether to scan beyond the header in 
that case. Best of both worlds IMO.


Roger.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-19 Thread Ashley Sheridan
On Fri, 2010-03-19 at 15:43 +0100, Roger Hågensen wrote:

 On 2010-03-19 15:17, Ashley Sheridan wrote:
  Search engines and people are not the only content parsers. Sure, you 
  would expect a parser to maybe look further into the content if the 
  description meta tag was missing, but imagine if a parser had to do 
  this for all the content it looked at? There are still overheads to 
  consider.
 
  Why not just use server-side code to output the first paragraph of 
  content as the description for the page also?
 
  I just feel that the head and body areas of a page have two 
  distinct uses, and unnecessary crossovers shouldn't occur if it's 
  avoidable.
 
 True, but there is also such a thing as uneeded redundancy, sure 
 repeating the same info in the meta tags which is also in the document 
 may not add that many KB,
 but with increasing number of page requesters that really pile up the 
 bandwidth total. Something both users and hosters and ISPs should have 
 an interest in right?
 If you look at my other thread Re: [whatwg] meta name=description 
 href=#desc /
 It allows notifying the parser that the content is in the page, and it 
 is up to the parsers configuration whether to scan beyond the header in 
 that case. Best of both worlds IMO.
 
 Roger.
 


I did see that, and it looks like a great idea, as it shouldn't really
break anything, and I saw that it should be possible to use for the
keywords too, which would fit perfectly with tag cloud systems used on a
page.

I would presume that this would cause the content parser (browser) to
strip any and all tags surrounding the marked content?

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-19 Thread Roger Hågensen

On 2010-03-19 15:43, Ashley Sheridan wrote:

On Fri, 2010-03-19 at 15:43 +0100, Roger Hågensen wrote:

On 2010-03-19 15:17, Ashley Sheridan wrote:
  Search engines and people are not the only content parsers. Sure, you
  would expect a parser to maybe look further into the content if the
  description meta tag was missing, but imagine if a parser had to do
  this for all the content it looked at? There are still overheads to
  consider.

  Why not just use server-side code to output the first paragraph of
  content as the description for the page also?

  I just feel that thehead  andbody  areas of a page have two
  distinct uses, and unnecessary crossovers shouldn't occur if it's
  avoidable.

True, but there is also such a thing as uneeded redundancy, sure
repeating the same info in the meta tags which is also in the document
may not add that many KB,
but with increasing number of page requesters that really pile up the
bandwidth total. Something both users and hosters and ISPs should have
an interest in right?
If you look at my other thread Re: [whatwg]meta name=description
href=#desc /
It allows notifying the parser that the content is in the page, and it
is up to the parsers configuration whether to scan beyond the header in
that case. Best of both worlds IMO.

Roger.
 
I did see that, and it looks like a great idea, as it shouldn't really 
break anything, and I saw that it should be possible to use for the 
keywords too, which would fit perfectly with tag cloud systems used on 
a page.


I would presume that this would cause the content parser (browser) to 
strip any and all tags surrounding the marked content?


Thanks,
Ash
http://www.ashleysheridan.co.uk



Well, looking at the example 
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-March/025575.html
I remebeerd that thew title element may have html markup in it (seen it 
in the wild), so most parsers probably apply tag stripping to that already,

so yeah, stripping tags the parser do not want shouldn't be an issue really.

Roger.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-18 Thread Roger Hågensen

On 2010-03-18 03:37, Roger Hågensen wrote:

I know, replying to myself is a big no-no... *cough*

I searched the list, and looked at the HTML5 briefly and found 
nothing, nor can I ever recall such.

So this is both a question and a proposal.

On my own site currently I mostly replicate the first paragraph of an 
article in my journal as the meta description,
and write one up for other pages, usually replicating some of the 
content.


I'm both looking for and want a solution to avoid such redundancy.


I kept searching after posting that and looked more into HTML5 and 
microdata...
Besides a small anurism while trying to understand the darn thing I did 
find a possible solution, but is it valid?


Example using HTML5 microdata:
(would this be appropriate, would browser devs, and Google and other 
search engines support this?)


The following...

!doctype html
html lang=en
head
meta charset=utf-8 /
titleMicrodata replacing metadata example./title
/head
body
article
headerSection header./header
p itemprop=#descriptionThis is the first paragraph in the document 
or an aside or some other content perhaps./p

pMore content here./p
footerAuthor: a href=example.com/author/url/ 
itemprop=#authorRoger Hågensen/a on time 
datetime=2010-03-18T08:00:00 itemprop=#date18th March 2010 at 8 
o'clock./timebr /

span itemprop=#copyright© Roger Hågensen 2010/spanbr /
Keywords: span itemprop=#keywordsa 
href=http://example.com/tag/Example/;Example/a, a 
href=http://example.com/tag/Microdata/;Microdata/a, a 
href=http://example.com/tag/HTML5/;HTML5/a/span/footer

/article
/body
/html

replaces this...

!doctype html
html lang=en
head
meta charset=utf-8 /
meta name=description content=This is the first paragraph in the 
document or an aside or some other content perhaps. /

meta name=author content=Roger Hågensen /
meta name=date content=2010-03-18T08:00:00 /
meta name=copyright content=© Roger Hågensen 2010 /
meta name=keywords content=Example, Microdata, HTML5 /
titleMicrodata replacing metadata example./title
/head
body
article
headerSection header./header
pThis is the first paragraph in the document or an aside or some other 
content perhaps./p

pMore content here./p
footerAuthor: a href=example.com/author/url/Roger Hågensen/a on 
time datetime=2010-03-18T08:00:0018th March 2010 at 8 
o'clock./timebr /

span© Roger Hågensen 2010/spanbr /
Keywords: spana href=http://example.com/tag/Example/;Example/a, 
a href=http://example.com/tag/Microdata/;Microdata/a, a 
href=http://example.com/tag/HTML5/;HTML5/a/span/footer

/article
/body
/html

itemprop=#description would basically need to be reserved in some 
standards document, I just used the # arbitrarily to indicate this 
document in this example.


--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/



Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-18 Thread Ashley Sheridan
On Thu, 2010-03-18 at 03:37 +0100, Roger Hågensen wrote:

 I searched the list, and looked at the HTML5 briefly and found nothing, 
 nor can I ever recall such.
 So this is both a question and a proposal.
 
 On my own site currently I mostly replicate the first paragraph of an 
 article in my journal as the meta description,
 and write one up for other pages, usually replicating some of the content.
 
 I'm both looking for and want a solution to avoid such redundancy.
 
 The perfect solution would be a summary tag, if you look at the 
 journal articles on my site you can imagine the first paragraph being 
 done like this:
 
 psummaryThis is just an example, it's a replacement for the old meta 
 description, and is a brief summary (description) of the page 
 (content)/summary/p
 
 This way the first paragraph in a page would remain unchanged from how 
 it is done today, and a search engine like Google or screen readers etc. 
 would use the summary tag instead
 of the meta description (which is no longer needed at all in cases like 
 this), if more than one summary tag the first is considered the page 
 summary one, while the others are ignored (but still shown as content 
 obviously).
 
 If a new tag is overkill for this, maybe doing it this way instead 
 (using one of the new HTML5 tags):
 pheader summaryThis is just an example, it's a replacement for the 
 old meta description, and is a brief summary (description) of the page 
 (content)/header/p
 
 I really do not care how this is implemented/speced just as long as it's 
 possible to do.
 
 I began thinking of this recently when it annoyed me that I basically 
 had to enter the same content twice, after looking at my site links in 
 Google,
 and thought to myself...Why do I have to use a meta description to tell 
 Google to show the content in the first paragraph as the default summary 
 of the page link?
 Why can't I simply specify that the first paragraph is the page's meta 
 description? Why am I forced to bloat the page unnecessarily like this?
 
 Thee is no reason why the meta description can not be the actual content 
 as in most cases I've seen the meta description is supposed to be fully 
 human readable,
 unlike the meta keywords which no search engines bothers with at all any 
 more.
 
 So if the meta description is supposed to be humanly readable and 
 displayable as the page summary to humans in search results,
 why can't it also actually be in the page content?
 
 I can see at least two ways this will be used. The more elegant way I 
 showed, where the first paragraph is a summary/the lead in of the page 
 (and also happens to be the teaser content in my RSS feed as well),
 or at the bottom of a page with possibly linked category tags or similar 
 within it, again allowing dual purpose and reduced redundancy.
 
 To re-iterate, the idea of the summary tag (or however it is 
 implemented) should be to have a human readable summary (or teaser as 
 may be) of a page, which is itself shown in the page,
 but also a replacement for search engines that use the old meta 
 description avoiding redundancy.
 
 End result is (hopefully) less redundancy, and higher quality summary 
 (page description) shown in search engine results, and so on.
 Also allowing people to quickly understand what a page is about by just 
 reading the first paragraph (or be enticed to read more).
 
 Now if something like this allready exist/is possible I stand corrected 
 and ask, please tell me how to do that.
 If not then I'd love to see something like this standardized.
 
 BTW! The text in the first paragraph of this very email could for 
 example be the summary/description of this email.
 So if it was html tagged in some way, a mail indexing or search engine 
 could use that as the summary or description view shown to a human user 
 scrolling through archived emails.
 
 Regards,
 Roger.
 


The main problem with that would be that parsers would then need to read
into the body of the page to produce a description of your site. This
might not produce much of an overhead on a one-off basis, but imagine a
parser that is grabbing the description from hundreds or thousands of
pages, then this could become a bit of a problem.

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-18 Thread Julian Reschke

On 18.03.2010 03:37, Roger Hågensen wrote:

I searched the list, and looked at the HTML5 briefly and found nothing,
nor can I ever recall such.
So this is both a question and a proposal.

On my own site currently I mostly replicate the first paragraph of an
article in my journal as the meta description,
and write one up for other pages, usually replicating some of the content.
...


See related W3C bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=7577.

Best regards, Julian


[whatwg] summary tag to help avoid redundancy of meta description tag!?

2010-03-17 Thread Roger Hågensen
I searched the list, and looked at the HTML5 briefly and found nothing, 
nor can I ever recall such.

So this is both a question and a proposal.

On my own site currently I mostly replicate the first paragraph of an 
article in my journal as the meta description,

and write one up for other pages, usually replicating some of the content.

I'm both looking for and want a solution to avoid such redundancy.

The perfect solution would be a summary tag, if you look at the 
journal articles on my site you can imagine the first paragraph being 
done like this:


psummaryThis is just an example, it's a replacement for the old meta 
description, and is a brief summary (description) of the page 
(content)/summary/p


This way the first paragraph in a page would remain unchanged from how 
it is done today, and a search engine like Google or screen readers etc. 
would use the summary tag instead
of the meta description (which is no longer needed at all in cases like 
this), if more than one summary tag the first is considered the page 
summary one, while the others are ignored (but still shown as content 
obviously).


If a new tag is overkill for this, maybe doing it this way instead 
(using one of the new HTML5 tags):
pheader summaryThis is just an example, it's a replacement for the 
old meta description, and is a brief summary (description) of the page 
(content)/header/p


I really do not care how this is implemented/speced just as long as it's 
possible to do.


I began thinking of this recently when it annoyed me that I basically 
had to enter the same content twice, after looking at my site links in 
Google,
and thought to myself...Why do I have to use a meta description to tell 
Google to show the content in the first paragraph as the default summary 
of the page link?
Why can't I simply specify that the first paragraph is the page's meta 
description? Why am I forced to bloat the page unnecessarily like this?


Thee is no reason why the meta description can not be the actual content 
as in most cases I've seen the meta description is supposed to be fully 
human readable,
unlike the meta keywords which no search engines bothers with at all any 
more.


So if the meta description is supposed to be humanly readable and 
displayable as the page summary to humans in search results,

why can't it also actually be in the page content?

I can see at least two ways this will be used. The more elegant way I 
showed, where the first paragraph is a summary/the lead in of the page 
(and also happens to be the teaser content in my RSS feed as well),
or at the bottom of a page with possibly linked category tags or similar 
within it, again allowing dual purpose and reduced redundancy.


To re-iterate, the idea of the summary tag (or however it is 
implemented) should be to have a human readable summary (or teaser as 
may be) of a page, which is itself shown in the page,
but also a replacement for search engines that use the old meta 
description avoiding redundancy.


End result is (hopefully) less redundancy, and higher quality summary 
(page description) shown in search engine results, and so on.
Also allowing people to quickly understand what a page is about by just 
reading the first paragraph (or be enticed to read more).


Now if something like this allready exist/is possible I stand corrected 
and ask, please tell me how to do that.

If not then I'd love to see something like this standardized.

BTW! The text in the first paragraph of this very email could for 
example be the summary/description of this email.
So if it was html tagged in some way, a mail indexing or search engine 
could use that as the summary or description view shown to a human user 
scrolling through archived emails.


Regards,
Roger.

--
Roger Rescator Hågensen.
Freelancer - http://EmSai.net/