It doesn't seem a good way to do it. What if there are no outlinks? This
method won't be called at all. And anyway, it would be called once per
each outlink, which would multiplicate the work.
Multiplication is easy to solve but you are right that it won't work
if there are no outlinks.
Maybe scoring filter api should change? A distributeScoreToOutlinks
method may be more useful than the current one: (which will be called
even if there are no outlinks)
CrawlDatum distributeScoreToOutlinks(Text fromUrl, List<String>
toUrlList, List<CrawlDatum> datumList, ParseData parseData,
CrawlDatum adjust)
This method gives more control to the plugin since knowing all the
outlinks the plugin can make more informed decisions. Like, right now,
there is no way a scoring filter can be sure that it has distributed
all its cash (e.g if db.score.internal.link is 0.5 and
db.score.external.link is 1.0, filter will almost always distribute
less than its cash).
This will also work for your case, since you will just ignore the
outlinks and return the adjust datum based on information in parse
metadata.
What do you (and others) think?
I think that good API design here means not assuming so many things
about the plugin behaviour. You are right about this
"distributeScoreToOutlinks()", but IMO it should be called something
like assignScores(). Then you could add an abstract class
DistributingScorePlugin (implementing the interface) which overrides
assignScores() and calls an "abstract protected" method called
distributeScoreToOutlink().". So the code for traversing the outlinks
would be in DistributingScorePlugin.
I would need another class, called ContentBasedScorePlugin. That class
could call an abstract protected method called calculateScore() which
would receive the parsed data and return the score.
What do you think?